Tuesday, December 31, 2013

NFL Player Tree (Using R)

NFL Player Tree

Inspired by the soccer player tree in an earlier post, I pulled some recent National Football League player attributes from EA Sports's Madden 25 (players ranked at least 95):

I used R to do the heavy statistical lifting and used the package ggplot2 to get the nice tree plot.

Without getting into all of the nuances that I did with the soccer version, here is the NFL player tree, color-coded by position type

By Position Type:  Quick takeaways:
  1. Andy Lee is a punter with similar attributes to top quarterbacks
  2. Cameron Wake is a lineman who may be better as a linebacker
  3. Offensive lineman are the most opposite to the skill players
  4. The tree has three major branches (generalized as follows):
    1. Skill players
    2. Defensive backs/Linebackers
    3. Offensive line

Tuesday, October 8, 2013

Got Tickets? Sorta

The big game starts at 1 pm. You know the local sports pubs will be packed. The common solution is to get there very early, grab a table or two, put coats on adjacent seats, and uncomfortably try to wait out the countdown to kickoff. Are my friends coming? Is the server judging me?

This tactic is costly to your time and the pub's opportunity to seat a lunch customer. Surely there is a better way. What if the pub sold tickets (or it could call them reservations)?

You and a friend each buy a ticket for, say, $6 apiece and get a guaranteed seat starting at 12:50 pm. Is $6 worth the peace of mind of having a low-stress seat and the wasted time of hoarding space an hour or more in advance? Probably so. If not, are there enough other people who do think this is a good price? Probably so. If not, maybe the price should be $4. It will work itself out over time, since the pub has a strong incentive to keep the tables full.

This is a win-win, if framed correctly. Maybe the pub could introduce the concept by initially keeping some tables first-come, first-served.

Shhh! Don't Tell 'Em You Want Compromise

With the uproar over the partial government shutdown and the debt ceiling stalemate, there seems to exist a sentiment that federal politicians are somehow not following the wishes of the electorate. Of course the electorate in not a monolith, but the feeling from news programs is that the deviation is not just unrepresentative but completely detached from any of their wishes. Interviews show the public wanting negotiation and compromise to get back to a more stable environment.

Maybe the general public's frustration was all too predictable. Further, maybe the general public unwittingly contributed to the rancor. How?

Politicians know that the American people are likely to demand compromise in the face of prolonged arguments:  You want eight popsicles; we want twelve popsicles; make it ten and get back to business.

So if you are expected to compromise at the end, then the rational decision is to adopt the most radical position initially. Where do you think the eight and twelve came from?

At the extreme, imagine a company that has a history of difficult salary negotiations. In response, the company implements a policy whereby the applicant and the manager split the difference to determine the salary. How much are you going to ask for? How much will your boss ask for? Now we're talking extreme amounts of popsicles.

In a counterfactual universe, the public values not compromising and sticking with one's initial demand. This seems counterintuitive to a public that wants less fighting. But to what outcome would this lead? If you have to stick with your first demand or very close to it AND you know the other guy does as well, then your initial demand will have to be very close to what you believe the fair outcome would be. If your demand is too far from your opponent's then you risk violating what the public values. Since you both understand this, your fight will be over 9.1 and 9.4 popsicles. Boring...just like you like your politicians.

You Cut My Hair Too Short; Perfect!

How do you judge the quality of your haircut? How should you?

It's common to judge the quality of your haircut based on how it looks right after it's been cut, even though you know that it will be weeks before you return for another cut. Given that, it might make more sense to judge your haircut over the entire interval before the next visit.

Imagine one men's haircut that looks perfect today. After four weeks the hair has grown a half-inch, which is noticeably shaggy for the typical male professional hair length. On average, this guy's hair is a quarter-inch too long over a four week period.

For comparison, imagine if he had gotten the same style haircut but a quarter-inch too short on purpose. Although his hair is a little short on Day 1, by week four his hair has been only an eighth-inch away from ideal on average. Also, he still enjoyed one day of perfect length during the four week interval just like the earlier scenario had.

Does this extend to longer hairstyles? It does if the period between cuts is longer, which is often the case. If you wait four months between cuts, then that's two inches of growth. One inch off on average in the first scenario versus one quarter-inch off in the second. An important assumption here is that your hairstyle looks equally good/bad whether it's a little too long or a little too short. If not, you can adjust the intentional extra chop to stay at optimal.

Monday, September 2, 2013

Rather be Good than Lucky

Coaches often say that "winning is everything."  This has the ring of just another cliche that fills many sports interviews, and it's easy to nod in agreement and move on to the next boilerplate response. But, does this point of view actually make sense?

Think about a sports outcome as a rough equation that takes into account skill, strategy, physical attributes, and all the other similar inputs. In real life, we know that there is a random factor as well. Did a player guess correctly? Did the wind change direction at the wrong second? Did the referee make the right call? And on and on.  In statistics, the above equation would have " +  ε" tacked onto the end to represent an error term to account for what the equation does not get right.

From xkcd:
Also, all financial analysis.  And, more directly, D&D.

How about an example:
Say an American football team has worked hard and is expected to score 30 points in an upcoming game, while its opponent is a bit inferior and is expected to score 26 points. Now the error term is, say, a range of up to three points either way for each team. A statistician might say that the first team is clearly better. If the error term is uniformly distributed (to simplify the example), then there are seven potential point totals for each team and, thus, forty-nine scenarios. 88% of the time, the first team would actually win; 6% of the time the second team would get the upset; 6% would be a tie.

To put this in perspective, imagine a rule change whereby a random number was added to each team's final score to determine the winner. There would be an outcry, and the team with the better original score could claim to be better. This is essentially what happens now, except the random number is hidden in the incalculable seams of wind currents and grass divots and sun glare, etc.

A coach can certainly work on minimizing the chance that an upset might occur based on randomness, but not eliminate it. Thus, it might seem more logical for a coach to focus on improving the team's expected chance of winning without the error term and not worry so much about the random number added to the end of it on any given Sunday. This is because a coach should focus on making the team better going forward; the previous game's outcome cannot be changed.

A statistician would calculate a confidence interval to determine the better team. Not likely to get the same television ratings though!

The University of Alabama football team has enjoyed great success under head coach Nick Saban. A polarizing figure, he seems to understand the above perspective. His comment after winning the National Championship? "My job is to put the players in position to win." Does anyone doubt that Saban would be thankful but scowling and unhappy if his team were to play sub-par yet got some random luck and won? His strategy to maximize the team's success? Follow "The Process," which focuses on improving the team's expected performance while minimizing randomness. Seems to be working pretty well. But given the error term, who really knows?

Thursday, August 29, 2013

Syrian Solace

Much news from Syria has been depressing. Below is a bright poem from a living Syrian poet.


Celebrating Childhood

Even the wind wants
to become a cart
pulled by butterflies.
I remember madness
leaning for the first time
on the mind’s pillow.
I was talking to my body then
and my body was an idea
I wrote in red.
Red is the sun’s most beautiful throne
and all the other colors
worship on red rugs.
Night is another candle.
In every branch, an arm,
a message carried in space
echoed by the body of the wind.
The sun insists on dressing itself in fog
when it meets me:
Am I being scolded by the light?
Oh, my past days—
they used to walk in their sleep
and I used to lean on them.
Love and dreams are two parentheses.
Between them I place my body
and discover the world.
Many times
I saw the air fly with two grass feet
and the road dance with feet made of air.
My wishes are flowers
staining my days.
I was wounded early,
and early I learned
that wounds made me.
I still follow the child
who still walks inside me.
Now he stands at a staircase made of light
searching for a corner to rest in
and to read the face of night again.
If the moon were a house,
my feet would refuse to touch its doorstep.
They are taken by dust
carrying me to the air of seasons.
I walk,
one hand in the air,
the other caressing tresses
that I imagine.
A star is also
a pebble in the field of space.
He alone
who is joined to the horizon
can build new roads.
A moon, an old man,
his seat is night
and light is his walking stick.
What shall I say to the body I abandoned
in the rubble of the house
in which I was born?
No one can narrate my childhood
except those stars that flicker above it
and that leave footprints
on the evening’s path.
My childhood is still
being born in the palms of a light
whose name I do not know
and who names me.
Out of that river he made a mirror
and asked it about his sorrow.
He made rain out of his grief
and imitated the clouds.
Your childhood is a village.
You will never cross its boundaries
no matter how far you go.
His days are lakes,
his memories floating bodies.
You who are descending
from the mountains of the past,
how can you climb them again,
and why?
Time is a door
I cannot open.
My magic is worn,
my chants asleep.
I was born in a village,
small and secretive like a womb.
I never left it.
I love the ocean not the shores.

Soccer English

European soccer leagues have started their seasons and there are more opportunities than ever for watching games from America. Since English words can have different meanings across the Atlantic, here are a few common terms and their American-English equivalent. Any more to add?
  • Football = Soccer
  • Fixture = Game
  • Match = Game
  • Tie = Match
  • Table = Standings
  • Draw = Tie
  • Euro = NIT Tournament
  • Champions League = NCAA Tournament
  • Pitch = Field
  • Mario Balotelli = Dennis Rodman

Tuesday, August 27, 2013

NFL General Manager, You Had One Job!

The National Football League is touted as having parity. Any Given Sunday. Free agency, salary caps, and revenue-sharing are some of the means whereby the better teams are reined in so that anything can happen on any given Sunday. Even teams with poor records are rarely blown-out consistently. Advantages from innovation are short-lived due to copy-catting. You would expect an NFL team's success to resemble other generally random functions, replete with mini streaks and constant reversals of fortune.

Deviations from random should correlate with protected attributes such as rookie mandatory salaries. Since 2002, a team should win the Super Bowl every 32 years, its conference every 16 years, its division every 4 years. But, other teams may concentrate their resources in a certain year so that your team is not up against a similarly random-quality team. So a season might not be the best interval since a general manager can optimize over several years with salary cap considerations. In this case, an NFL general manager would be wise to pick a certain year or maybe two year range, then time the team's resources to hit on all cylinders. This strategy is at the cost of having some down years since the team would be rearranging the timing of its strengths.

Bottom line: a general manager's main objective is to identify the best season to peak, based on what other teams are likely to do; balancing talent over each year is a recipe for mediocrity when it comes to winning titles.

Wednesday, August 14, 2013

Golf Scramble Simulation in R

Golf Scramble Simulation

Golf Scramble Simulation

This is a simulation of a standard best-ball golf scramble. Conventional wisdom has it that the best golfer (A) should hit last, the idea being that one of the lesser golfers may have a decent shot already so the best golfer can take a risky shot. This simulation suggests that the worst golfer should indeed go first, but after that the order should be best on down (D, A, B, C). Perhaps a rationale is that golfer A will likely make a decent safe shot, which allows the other two medium skilled golfers a chance at a risky shot.

This is one of my first cracks at a sports simulation in R, so I welcome any comments about errors or constructive criticism.

## Attaching package: 'combinat'
## The following object is masked from 'package:utils':
## combn
n <- 10000
Create golfer attributes
safe.attributes <- data.frame(golfer = c("a", "b", "c", "d"), mean = c(8, 7, 
    6, 5), sd = c(1.5, 1.5, 1.5, 1.5))
risk.attributes <- data.frame(golfer = c("a", "b", "c", "d"), mean = c(7, 6, 
    5, 4), sd = c(3, 3, 3, 3))

safe.densities <- apply(safe.attributes[, -1], 1, function(x) sort(rnorm(n = 1000, 
    mean = x[1], sd = x[2])))
colnames(safe.densities) <- safe.attributes$golfer
safe.df <- data.frame(safe.densities)

risk.densities <- apply(risk.attributes[, -1], 1, function(x) sort(rnorm(n = 1000, 
    mean = x[1], sd = x[2])))
colnames(risk.densities) <- risk.attributes$golfer
risk.df <- data.frame(risk.densities)
Plot golfer attributes
par(mfrow = c(2, 2))
par(mar = rep(2, 4))

plot(density(safe.df$a), col = "blue", xlim = c(0, 16), ylim = c(0, 0.3), main = "Golfer A", 
    col.main = "black", font.main = 4)
lines(density(risk.df$a), col = "red")
legend("topright", c("safe", "risk"), cex = 0.8, col = c("blue", "red"), lty = 1)

plot(density(safe.df$b), col = "blue", xlim = c(0, 16), ylim = c(0, 0.3), main = "Golfer B", 
    col.main = "black", font.main = 4)
lines(density(risk.df$b), col = "red")
legend("topright", c("safe", "risk"), cex = 0.8, col = c("blue", "red"), lty = 1)

plot(density(safe.df$c), col = "blue", xlim = c(0, 16), ylim = c(0, 0.3), main = "Golfer C", 
    col.main = "black", font.main = 4)
lines(density(risk.df$c), col = "red")
legend("topright", c("safe", "risk"), cex = 0.8, col = c("blue", "red"), lty = 1)

plot(density(safe.df$d), col = "blue", xlim = c(0, 16), ylim = c(0, 0.3), main = "Golfer D", 
    col.main = "black", font.main = 4)
lines(density(risk.df$d), col = "red")
legend("topright", c("safe", "risk"), cex = 0.8, col = c("blue", "red"), lty = 1)
plot of chunk unnamed-chunk-3
Create holes dataframe
golfPerms <- permn(letters[1:4])
holes <- data.frame(matrix(NA, nrow = n, length(golfPerms)))

for (i in 1:length(golfPerms)) {
    colnames(holes)[i] <- paste0(golfPerms[[i]][1], golfPerms[[i]][2], golfPerms[[i]][3], 
for (j in 1:n) {
    for (i in 1:length(golfPerms)) {
        shot1 <- sample(safe.df[, substr(golfPerms[[i]][1], 1, 1)], 1, T)

        if (shot1 >= 6) {
            shot2 <- max(shot1, sample(risk.df[, substr(golfPerms[[i]][2], 1, 
                1)], 1, T))
        } else {
            shot2 <- max(shot1, sample(safe.df[, substr(golfPerms[[i]][2], 1, 
                1)], 1, T))

        if (shot2 >= 6) {
            shot3 <- max(shot2, sample(risk.df[, substr(golfPerms[[i]][3], 1, 
                1)], 1, T))
        } else {
            shot3 <- max(shot2, sample(safe.df[, substr(golfPerms[[i]][3], 1, 
                1)], 1, T))

        if (shot3 >= 6) {
            shot4 <- max(shot3, sample(risk.df[, substr(golfPerms[[i]][4], 1, 
                1)], 1, T))
        } else {
            shot4 <- max(shot3, sample(safe.df[, substr(golfPerms[[i]][4], 1, 
                1)], 1, T))

        holes[j, i] <- shot4
Find winning order per hole
winners <- data.frame(matrix(NA, nrow = n, ncol = 1))
names(winners) <- "winner"
for (k in 1:n) {
    winners[k, 1] <- colnames(holes)[(which.max(holes[k, ]))]
winnerCounts <- data.frame(table(winners))
winnerCounts$winners <- reorder(winnerCounts$winners, -winnerCounts$Freq)
Plot results
par(mfrow = c(1, 1))
ggplot(data = winnerCounts, aes(x = winners, y = Freq)) + geom_bar(colour = "black", 
    fill = "#DD8888", width = 0.7, stat = "identity") + guides(fill = FALSE) + 
    xlab("Order") + ylab("Wins") + ggtitle("Golf Scramble Simulation")
plot of chunk unnamed-chunk-7
ddunn801 at gmail dot com

Monday, August 12, 2013

The Double Buzzsaw Problem

The Double Buzzsaw Problem:
Allocating the Impact from
Simultaneous Percentage Decreases

Link to PDF