Thursday, January 30, 2014

The Meta- State of the Union 2014

Max Ghenis has a nice text analysis of Martin Luther King Jr.'s famous "I Have a Dream" speech. You can read about his methodology here:  Statistics meets rhetoric: A text analysis of "I Have a Dream" in R.

This got me wondering about the President Obama's 2014 State of the Union speech. Using his template, you can see below that the speech is slightly positive in the main, with a bit more positivity for the last 10%. Without hearing the words, the data reveal a generally optimistic, yet restrained message (most data points are within 0.2 units).

Readability was also fairly steady on the average, but there is quite a bit of variability on the higher end. This reflects a speech that was written intentionally at a generally high school reading level, yet occasionally included some "big words." It seems that these words were not clustered together, so perhaps one goal was to avoid stretches of time with less familiar words.

Now, sentiment and readability can be thought of as inputs to a speech--ingredients in a soup. One output of a speech is memorability. I use Max's general idea of using Google search hit counts as a proxy for memorability. What's the verdict? Fairly bland results. Well it has a couple of inflections, starting out rising, then a dip at about the 40% mark and then a slight rise to the end.

Some quick analysis shows that readability is more important than sentiment when it comes to predicting the memorability of the speech's sentences. However, neither attribute was very predictive for this particular speech.

What were the most memorable sentences?

  1. And when our children’s children look us in the eye and ask if we did all we could to leave them a safer, more stable world, with new sources of energy, I want us to be able to say yes, we did.
  2. When we rescued our automakers, for example, we worked with them to set higher fuel efficiency standards for our cars.
  3. Our freedom, our democracy, has never been easy.                                                                      
  4. "We are the face of the unemployment crisis," she wrote.                                                                    
  5. Kids, call your mom and walk her through the application.

Two appeals to grand ideals (1 & 3). Two references to considering red tape a benefit.  (2 & 5). One quotation reminding us of real human suffering (4). I think people from all parties might consider that a representative mix of Barack Obama.

P.S. The sentiment algorithm considers "vice" to be a negative word, as in "vice president"; I decided not to change it...

Monday, January 20, 2014

Listen to the Whole Album! Or, Smells Like Teen Spirit In Bloom

In the golden age of vinyl, a music album was often created with the intention that the listener could best appreciate the music by listening to the songs in order. Much would be lost if you played Pink Floyd's The Wall on shuffle.

Nowadays, music is thought of almost exclusively at the single song level. Go to iTunes and pick the song you want to download. It's not clear why artists even wait for sets of ten songs before releasing them. Certainly some artists care about the song order and strive to make a coherent collection based on a theme or story. Let's take an extreme case using a fictitious band:  Sleepy Babies records an album and are dead set on having it listened to in order. What can they do, given the single mentality today?

Let's make a distinction between song number and track number. Song is the complete work (verse-chorus-verse, as Kurt Cobain used to say). Track number is the slot in the order.

The band's engineer could start by having track 1 end at song 1's halfway point. Then track 2 would be the second half of song 1 and the first half of song 2. Likewise, track 3 would be the second half of song 2 and the first half of song 3. Continue this half-and-half method until the final track, which would be the second half of the penultimate song plus the entire final song (to wrap up cleanly and keep the song count equal to the track count). [Radiohead could turn this into a Mรถbius strip of songs by combining the final song's ending and the first song's beginning.]

Convoluted, yes. But a listener playing these tracks would not experience any problems. But try to listen to a specific song and it's a huge hassle. Just like the artist intended.

Network Analysis of the RISK Board Game Map, Coursera

Coursera has a class that started recently called Social and Economic Networks: Models and Analysis. The professor, Matthew O. Jackson of Stanford, teaches how to model and analyze networks such as friendships, commerce, or contagions. Thus far, he has explained several metrics useful for describing a network and its nodes. Betweenness, closeness, various other centrality and clustering measures help make sense of complex networks.

Acting as a complete amateur (two weeks in), below is a quick attempt at implementing a few of these concepts on a familiar network: a RISK game board. RISK is a game that simulates Earth's countries (some are regions) with the objective of dominating the globe by winning battles with other players. One can be attacked only by an adjacent country, so it quickly becomes clear that some countries are harder to defend than others. For example, Afghanistan touches six countries while Peru touches only two. Below is the game map:

At right is a chart summarizing the number of adjacent countries. Note that this link is often called an "edge." Basically, there are a few countries conducive to isolationist-style players (holing up in Australia is a common strategy for meek players who want to inherit the Earth). Countries exposed to five or six other countries can be difficult to defend. However, as a player's realm expands the only edges of concern are the outermost ones; internal boundaries can only be reached if the perimeter is first breached.

The course uses network analysis software called Pajek. I loaded in the RISK countries (vertices) and their neighbors (edges are technically the link between), then let the software draw a reasonable approximation of the true map that I tweaked manually. Of special note, is the Alaska-Kamchatka connection that the board game displays as wrapping around the back of the globe.

Let's look at a few metrics:
  • Betweenness
    • Highest: North Africa, Middle East, China
    • Lowest: Japan, Eastern Australia, Argentina
  • Closeness
    • Highest: Ukraine, Middle East, Afghanistan
    • Lowest: New Guinea, Western Australia, Eastern Australia
  • Proximity Prestige (4)
    • Highest: Ukraine, Middle East, Afghanistan
    • Lowest: New Guinea, Western Australia, Eastern Australia
Looks like the most connected countries (regions) have a lot of geopolitical issues today. Conversely, the least connected seem to be off the radar as far as global headlines go.

Maybe all oceans should be called Pacific.