Wednesday, April 23, 2014

Dislike Facebook's News Feed

Facebook recently changed its News Feed to display certain postings based on factors besides chronological posting time. This is a major negative for users. A major source of Facebook's value to people was that one could keep with all friends. Now, since the posts with the most activity are pushed toward the top, one gets info in the real-world proportion of loudest/most extroverted/most friended. However, quality of content does not necessarily correlate with extroversion. Additionally, a major input of a specific post's activity is the number of likes. Serious postings and postings revealing bad news are much less likely to get liked due to "liking" being an inappropriate response. Again, Facebook's algorithm prioritizes easily likable news to the detriment of other posts. Bad move, Facebook.

Friday, April 18, 2014

Insight from FIFA 14’s Player Attributes (Using R)

FIFA 14 is a video game by EA Sports that mimics the experience of managing and playing for a soccer team. The game uses the likenesses and attributes of real players and this is part of the appeal. Although I rarely play video games, I am an avid soccer player and got curious about what could be learned by taking a closer look at the game-assigned player attributes. is a good source of FIFA 14 data. I scraped the html from the two hundred-plus pages of player attributes and then munged them into a useful table. Players have an overall rating and they have six specific stats (pace, shooting, passing, dribbling, defending, and heading). Each player has an assigned position; I collapsed the positions into a “type” category (Defense, Midfield, Forward). The modern game effectively has four lines of players but the position names still carry the naming conventions of the days of the three line formations, such as 4-4-2.

Player Positions and Position Types

Below is a chart summarizing player rating by position. The charted is sorted in ascending median rating. There is a great deal of spread, but generally the center midfielder and fullbacks are a bit lower than the wingers and wingbacks.
The collapsed view below corresponds with the above chart:  a slight bias as the position becomes more offensive-minded.

Modeling Player Ratings

I built a linear model for each position “type” and found R-squared values ranging 88%-99%. Each model used all six attributes as predictors with overall rating as the dependent variable. I speculate that player age/experience may account for the unexplained variance. Below is a look at the performance of each position type’s model. Both images visually support the models’ validity.

Position Type Models

Each position type’s model naturally has a different mix of attribute weights. Below are charts showing these weights.
Forwards need to be good at shooting and this is expressed in the above graph. Interestingly, passing is actually negatively correlated with a forward’s rating. I can think of several great forwards I have played with that fit this category!
Midfield ratings are more balanced than that of defense but dribbling and passing are the two most important skills for this position type.
It is clear that defending is far and away the most important skill for defenders; this is less an insight than an indictment of the game developers for not breaking defense into its own attributes such as tackling and positioning.
Goalkeepers are specialists so these skills are not as directly relevant, but I included them for completeness.


Each player’s position is assigned in the database. This leads to the possibility of having a player being theoretically higher rated in a different position. I found some evidence of this. Below is a table of the top three mismatches by position.
Best Rating
Craig Gardner
Guillaume Gillet
Steven Reed
Cristiano Ronaldo
Arjen Robben
Thomas Müller

Best Rating
Philipp Lahm
Dani Alves

Antonio Cassano
Sebast. Giovinco
Best Rating

Yaya Touré
Sergio Busquets
Xabi Alonso
Karim Guédé
Lee McCulloch
Mikael Dahlberg



Defenders better as midfielders are an impressive crew:  Lahm, Alves, and Marcelo are three of the top players. Midfielders better as defenders are known for their holding prowess and their enforcer reputation. Midfielders better as forwards are often impressive wingers who can use their speed as a weapon in the open wide spaces. Forwards better as midfielders are represented by two Italians and Neymar, which is surprising since he is viewed as a potent striker. As someone who has watched countless matches, I venture that the positions should be thought of in terms of where the player is expected to defend not necessarily where they are expected to attack; it is common for wingers to cut inside and act like forwards once the opponent’s defenders are occupied by the true forwards. Likewise, the rise of the offensive-minded wing backs can cause trouble for defenses that have to cope with a late runner joining the attack.

Model Outliers

The model does a good job of predicting a player’s overall rating, but there are a few exceptions.

At Assigned Position
At Best Position
Better than Predicted
Raoul Cedric Loé
Stefan Reinartz
Mesut Özil
Franck Ribéry 
Luca Toni
Worse than Predicted
Greg Tempest
Musharraf Al Ruwaili
Jacob Shoop
Nicholas Gotfredsen
Don Anding
Josh Ford
Most of these players are lesser known, with the exception of the top right box. These players must have magic not captured in the regular six attributes; one might call this the X Factor.


There is some evidence that the player attributes lead to a few common clusters. Below is a chart showing the weighted sum of squares for a given cluster count. This is a bit of visual confirmation that there are three or four general styles of player; past that the WSS does not change as much.

Player Tree

Finally, I clustered the top field players (overall rating at least 85) hierarchically. What developed was an insightful way to visualize how different players are stylistically related to each other.
Football / Soccer's very own family tree. The forward Gareth Bale is mixed in between the midfielders and defenders. The forward Lionel Messi is mixed in with the midfielders.These are two of the most talked about players today. Maybe being mixed in with different position players in the tree is predictive of being an important, interesting player. If so, keep your eyes on Thomas Müller.

Saturday, April 5, 2014

Should e-Commerce Ad Spend per Sale Decrease?

Moe's Bar Graph?
You run an e-commerce website with one durable product. With a proud smile, your marketing spend guru, Spike, shares a chart showing how ad spend per unit sold is steadily dropping. He gets confused when he sees a look of frustration on your face. What's the problem, this seems like 100% good news?

Well, Spike is comparing his results to prior months. Seems reasonable. What happens when you compare his results to the optimal scenario, though?

First, let's flesh out what "optimal" my mean. You know, with certainty, the following:

  1. the list of people who will buy your product,
  2. the marketing mix strategy (content, site, cadence) that will trigger a purchase from each of these people, and
  3. the cost of this ad strategy for each customer.
Knowing all of this and given a monthly budget to spend on marketing, Spike should target sales from the people with the cheapest required marketing mix. Next month, he should target sales from the REMAINING people with the cheapest required marketing mix. These remaining people are, by definition, not as cheap to market to as the first set. Continuing this process, Spike's results would actually show an INCREASING cost per sale!