Thursday, December 1, 2016

Election Results vs. Benford's Law and the Return of City-States?

From Wikipedia:  Benford's law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small...

Benford's Law is a technique to screen for fraud in a large number of numeric records. Let's apply this to the 2016 U.S. election:


This chart of overall county vote totals by leading digit looks like it passes Benford's Law, but what about state-by-state results?





The states are sorted by number of counties since Benford's law breaks down with fewer records. The last couple of rows of states have zig-zag lines, probably due to this. There are four interesting states that do not have low county counts:

  • Virginia (VA) looks a bit off [Clinton won this state by 5%]
  • Kentucky (KY), Iowa (IA), and Mississippi (MA) look a bit off [Trump won these states by 30%, 10%, and 19%, respectively]
  • The dataset does not have records for Alaska, by the way.

Basically, the states with the oddest results were typically so lopsided as to scarcely matter.

An interesting observation is that the states with more counties clearly tilt toward Trump, while the states with fewer counties clearly tilt toward Clinton. Why might that be? Maybe the number of counties in a state is a proxy for average population size per county. If true, then the Clinton vote should correlate with the county population size. Does it? Indeed, the correlation is 35%.

Perhaps a theme to the election was the divergent preferences between counties with high populations versus the rest of the country. That suggests a potential solution: density-based laws, which might effectively be the return of city-states cooperating under one national flag. Individuals could pick their preferred city based on its basket of laws, obviating the imposition of a heavy blanket of laws from a national government that seesaws left and right, alienating an alternative half of its citizens each cycle.

Maybe the Constitution even effectively implemented this flavor of federalism in 1789, since state populations were then the size of present day cities. Just food for thought, let's not get carried away.

Wednesday, September 14, 2016

Estate Tax: Give + Receive

The Estate Tax (a/k/a Death Tax) inspires conflicting impulses. On one hand, it treats a lifelong spender's income differently from that of a lifelong saver just because there is leftover income in the bank (excellent exposition here from Greg Mankiw). On the other hand, it seems unfair for the lucky recipient of the funds to start off life with so much of a leg up.

How can this apparent contradiction of unfairness be resolved?

Notice that an inheritance is really a transaction with two parts:  a give and a receive. One potential resolution is to remove the tax on giving, but to enact a tax on receiving funds over a certain threshold (say, $50,000).

With this knowledge, the giver would know that the gift is more effective when less concentrated. Expect the giver to give lower amounts to more people. Be careful not to set the threshold too low or the giver would find it impractical to stay under it and so might just ignore its intended dispersal effect.

There may be some unintended consequences so consider this food for thought.

Thursday, August 11, 2016

Form Follows Function, A-N-G-E-?

Five letters
Starts with A-N-G-E

Could either be ANGEL or ANGER

For ANGEL, the G is soft
For ANGER, the G is hard


I'd post a picture of Samael, but don't want to give you nightmares.

Dez Bryant in 9th; Or, How to Improve the Olympics

It's difficult to fully appreciate olympic-level athletic performances. For example, the top sprinter looks fast but only marginally faster than the competition.

How could the Olympics demonstrate how elite these athletes are?
  • Have Dez Bryant line up beside the 100-meter dash runners!
  • Have the world's best 10-year-old diver dive first, followed by the real competitors!
  • Have LeBron James clean and jerk a couple of plates with strain, while the real weightlifters pick the same up with ease!
  • Have Cristiano Ronaldo swim in an outside lane!

The wide margin between the slowest Olympian and the celebrity would not only clarify the athletes' excellence but also draw a few more eyeballs for the star factor.

Saturday, March 26, 2016

Are You Still There, Pilot? Moral Hazard Mitigation

It is tempting to view an advance in safety technology as an unqualified improvement. However, the advance often comes with some baggage called moral hazard. The idea is that your fancy new airbags might make you feel safer and so you drive a tad riskier, offsetting the improvement.

Inventors may have a duty to think through ways to mitigate this. For example:  aircraft autopilot systems could have a pop-up window at random times that requires the pilot to enter some information in order to continue operating, thus ensuring the pilot is not completely asleep or absent. Analogous concepts can be thought of for other safety technologies.

The Simpsons as a Chart

Inspired by this clever image, I thought I would whip it up in R.
Results:
Below is the R code:
 # Prepare -----------------------------------------------------------------  
 rm(list=ls());gc()  
 pkg <- c("ggplot2")  
 inst <- pkg %in% installed.packages()  
 if(length(pkg[!inst]) > 0) install.packages(pkg[!inst])  
 lapply(pkg,library,character.only=TRUE)  
 rm(inst,pkg)  
 # Create dataset ----------------------------------------------------------  
 d1 <- data.frame(member=c(rep("Homer",3),  
              rep("Marge",3),  
              rep("Bart",3),  
              rep("Lisa",2),  
              rep("Maggie",2)),  
          shade=c("HomerPants","HomerShirt","Skin",  
              "MargeDress","Skin","MargeHair",  
              "BartShorts","BartShirt","Skin",  
              "LisaDress","Skin",  
              "MaggieOnesie","Skin"),  
          height=c(20,20,25,  
              40,20,40,  
              15,15,18,  
              28,15,  
              18,11))  
 d1$member <- ordered(d1$member,levels=c("Homer","Marge","Bart","Lisa","Maggie"))  
 d1$shade <- ordered(d1$shade,levels=c("HomerPants","HomerShirt","Skin",  
                    "MargeDress","MargeHair",  
                    "BartShorts","BartShirt",  
                    "LisaDress",  
                    "MaggieOnesie"))  
 # Chart the data ----------------------------------------------------------  
 g1 <- ggplot(d1,aes(x=member,y=height,fill=shade)) +   
  geom_bar(stat="identity") +   
  scale_fill_manual(values=c("#4F76DF","#FFFFFF","#FFD90F",  
                "#83C33F","#2359F1",  
                "#6686C7","#E65120",  
                "#DA6901",  
                "#72C7E7")) +   
  theme(legend.position="none",  
     axis.title.x=element_blank(),  
     axis.title.y=element_blank(),  
     axis.text.x=element_blank(),  
     axis.text.y=element_blank()) +   
  ggtitle("Moe's Bar Chart")  
 g1  
 # Save image --------------------------------------------------------------  
 png("Simpsons.png")  
 g1  
 dev.off()  

Intrinsic or Contextual?


Have you saved your spreadsheet with the name "New Version"? When you make a subsequent change is this filename still accurate?

Attribution can be tricky. We take for granted that many descriptions are permanent attributes of some object, but this may be misleading. For example, one may say that a certain person is one's oldest son. "Oldest"-ness was determined by context, but is now always a fact:  a male who lacks an older brother will always have that lack, thus this attribute arose from context but is now eternally true of the person, thus it has become an attribute of the person.

What about youngest? A woman may have no younger sisters, so she is the youngest but this is dependent upon a context that can change. The moment she gets a little sister, her "youngest"-ness evaporates. Thus, this is a temporary, context-dependent attribute. In fact, every person who ever lived was at some point the youngest child (on Earth even, not just in one's family)!

This is a bit of an academic point, but is important to keep in mind when designing formal methods of storing information such as in database logical modeling. It can also be useful when thinking about public policy and taking care to understand whether a status is intrinsic or contextual:  a hopeful thought that people are treated fairly as individuals and not pigeon-holed.

Learn This When Leaving School for Work

If you are transitioning from being a student in higher education to entering the workforce, you are in for some culture shock. One example of this comes in the form of how others react to your statements. In the classroom, your instructor gives you a problem and you think through it and state your answer. The instructor knows the answer and can instantly tell that you are correct. In the business world, no one has an answer key. Thus, if you are unaccustomed to sharing your full thought process then your conclusion may seem like a non sequitur and you will be disappointed when you get stares instead of kudos for cracking a tough one.

Understanding this difference is important. To overcome this issue, be sure to share the logical steps you went through to arrive at your conclusion. Think if of like a consulting case interview question.

Bet Your Company, Not Your Industry

Congrats on the career change! Now you are exposed to a different set of risks to your income. You may be a corporate rockstar but that's a relative concept. Most likely, your company's fortunes drift up and down and your contribution is marginal in the big picture. For example, you could bring in $10 million profit one year and yet be a rounding error to many companies. Even larger is the risk to the industry (see fax machines). Seems unfair, but how can you effectively insulate yourself? Think like a hedge fund!

Let's say you have reason to believe that PepsiCo will fare better than Coca-Cola. A naive investor might buy shares in PEP. In reality, this investor is betting on both Coca-Cola AND the beverages industry. KO shares might drop 20% while PEP only drops 5%; congrats on your thesis but sorry about your account. A savvier investor might buy PEP and short sell KO at the same time. Now the exact outlook is reflected in the net trading position. Back to your situation...

As a full-time employee, your position is the opposite of diversification. Your income depends on your firm, while your firm's income depends on your competitive position and the industry's performance/outlook. You may even own company shares in a retirement fund. That's a big bet on one roulette number. Let's apply the hedge fund approach. It may be a bit distasteful to short sell your company's stock, plus you likely believe there is a sunny outlook since you agreed to work there. So, accept the new job offer, then short your industry via an exchange-traded fund. You have now limited your exposure to the relative success of your company to the industry, regardless of the larger economic picture.

Granted, this could be a bit complicated for employees who don't follow finance. A neatly packaged product could help more people cover this risk and greatly minimize the impact of unemployment or economic downturns for the savvy worker. Note that this financial arrangement effectively replicates unemployment insurance cash flows:  the employed pay a repeated small amount while working but gets a return when a downturn happens; the method above is similar, but more tailored to the individual and his or her company.

Use Signaling to Overcome Information Asymmetry in Life Insurance

I hope you are a picture of health, but if you aren't let's pretend so for a moment.

You eat well, exercise, take care of yourself, and avoid undue risk in your lifestyle. Your family depends on your income and you want to insulate them for the financial burden caused by a freak accident that might happen to you. How do you communicate your great health to a life insurance carrier? Information asymmetry is an issue for the carrier who wants to cover you but cannot be sure that you are not withholding some valuable piece of information.

You need a way to signal your belief. Signaling that is costly to the signaler is the most valuable way to communicate your belief. Perhaps one way is to buy a policy that does not go inforce for X years (maybe X is 3). The carrier should interpret this favorably and lower your premiums considerably. Since you still need coverage in the short-term, you decide to buy a regularly-priced policy for that interim
period. But after X years, you are paying a low amount and your carrier is content with the arrangement. Any other creative signaling ideas?

How to Fix Basketball's Tiresome Endgame

March Madness is upon us. Basketball fans rejoice. However, these games are really two distinct games:  the first lasts 39 game minutes and is based on quickness, teamwork, athleticism, strategy, and will; the second lasts 1 game minute and is based on a free throw contest, interspersed with countless fouls and timeouts.

Granted, these last-minute affairs can be exhilarating but that's because of the stakes not the action. Don't be fooled by ESPN:  those exciting clips should include the countless commercial breaks in the last 30 seconds to reflect accurately the tedium that comes with constant pausing.

There may be one simple fix to restore the game to 40 minutes of basketball. When a foul is called, allow the team to option to forgo the free throw shooting opportunity in exchange for a fresh shot clock. A competitive response to this rule change might be overly physical attempts to steal the resulting inbound pass, but this is already accounted for with the intentional foul rule (if it's enforced).

I have been involved with sports long enough to know that the true competitive response is not easily predictable so I would recommend experimenting with this rule change in exhibition matches and lower divisions.

Sunday, January 10, 2016

Visualing High Dimensions as DNA Strands

For a community project, I needed to research which U.S. cities were most similar to mine. The U.S. census has some wonderful data that covers 1,579 statistical areas, using the Office of Management & Budget's definition.

With this data, I selected the relevant attributes and then calculated the root mean squared error of the scaled distances from the target city as a (dis)similarity metric. After sorting the results, I could identify the ten most similar cities.

But there was a catch... I was to share my findings in a presentation and raw statistics aren't always conducive to making engaging slide shows. Displaying two dimensions is straightforward, maybe even three or four are doable. My analysis had forty-plus dimensions. My approach was to think of each attribute as a point in a DNA strand. When you are finished, each little twist and turn represents another data point. Up or down ticks don’t matter, they just show the shape of the city in data-vision. FYI, all images were created using the ggplot2 package in R.

Start with a graph of each (scaled) point plotted.

Next, remove the grid lines.

Connect the dots.

Display all of the other cities' strands.

Subset to the ten cities most similar to yours.
In your presentation software, just set the animation or action to show each image in succession. I could tell that the audience was able to comprehend the message of how I discovered the ten most similar cities to ours. Thought I would share this little technique in case you face a similar challenge. By the way, if anyone has seen this concept before I would love to know how it has been used elsewhere.
Visualized as a layered animation