Friday, October 23, 2015

What Does the AVERAGE Brand Logo Look Like?


PNG images are essentially a grid of values that represent colors to display. Since each cell in the grid is made up of numbers, I got curious about what it might mean to aggregate multiple PNGs. What would it look like to average two or more images? Median? Mode? Random?

To do so, I pulled the top 100 brands' logos from Best Global Brands.
Then I used the (layers of) values as inputs to aggregate in various ways.

Averaging these logos yields this gray blob that looks roughly, well, saturnine.

 Taking the median value results in what looks like a messy paintbrush stroke.

The mode reflects the heavy use of black.
The random one looks galactic! I like it the most...


Clearly, there is quite the uniformity in the logo design. Both horizontal and vertical symmetry are present. There is a bias towards a wider shape, similar to the dimensions of a word. Also, three general shapes tend to appear:  a perfect square, a perfect circle, and the long rectangle.

Is there one agency that designed most of these? They have so much in common.

Below is the R code. It is long because it reflects the evolution of my thought process. A dash of apply could speed up the explicit naming.


1:    
2:  # Prepare -----------------------------------------------------------------  
3:  rm(list=ls());gc()  
4:  pkg <- c("RCurl","XML","png","data.table","reshape","grid")  
5:  inst <- pkg %in% installed.packages()  
6:  if(length(pkg[!inst]) > 0) install.packages(pkg[!inst])  
7:  lapply(pkg,library,character.only=TRUE)  
8:  rm(inst,pkg)  
9:  setwd("your folder here")  
10:  set.seed(4444)  
11:    
12:    
13:  # Download HTML -----------------------------------------------------------  
14:  doc <- htmlParse("http://interbrand.com/best-brands/best-global-brands/2015/ranking/",  
15:           encoding="UTF-8")  
16:    
17:    
18:  # Parse HTML for image sources and info ----------------------------------  
19:  plain.src <- xpathApply(doc,"//img[@class='logo-img']",xmlGetAttr,"src")  
20:  plain.alt <- xpathApply(doc,"//img[@class='logo-img']",xmlGetAttr,"alt")  
21:    
22:  plain.rank <- xpathApply(doc,"//div[@class='brand-info brand-rank brand-col-1']",  
23:               xmlGetAttr,"title")  
24:  plain.region <- xpathApply(doc,"//div[@class='brand-info brand-region']",  
25:               xmlGetAttr,"title")  
26:  plain.country <- xpathApply(doc,"//div[@class='brand-info brand-country brand-col-5']",  
27:                xmlGetAttr,"title")  
28:  plain.sector <- xpathApply(doc,"//div[@class='brand-info brand-sector brand-col-6']",  
29:                xmlGetAttr,"title")  
30:  plain.value <- xpathApply(doc,"//div[@class='brand-info brand-value brand-col-7']",  
31:                xmlGetAttr,"title")  
32:    
33:    
34:  # Compile info ------------------------------------------------------------  
35:  d0 <- data.frame(Rank=unlist(plain.rank),  
36:           Country=unlist(plain.country),  
37:           Region=unlist(plain.region),  
38:           Sector=unlist(plain.sector),  
39:           Value=unlist(plain.value),  
40:           stringsAsFactors=FALSE)  
41:    
42:  d0$Rank <- gsub("Rank: ","",d0$Rank)  
43:  d0$Rank <- as.numeric(d0$Rank)  
44:  d0$Country <- gsub("Country: ","",d0$Country)  
45:  d0$Region <- gsub("Region: ","",d0$Region)  
46:  d0$Sector <- gsub("Sector: ","",d0$Sector)  
47:  d0$Value <- gsub("Value: ","",d0$Value)  
48:  d0$Value <- gsub("[^0-9]","",d0$Value)  
49:  d0$Value <- as.numeric(d0$Value)*1000000  
50:    
51:    
52:  # Download images ---------------------------------------------------------  
53:  n <- length(plain.src)  
54:  for(i in 1:n) {  
55:   if(!file.exists(paste0("Rank",d0$Rank[i],".png"))) {  
56:    download.file(paste0("http://interbrand.com",plain.src[[i]]),  
57:           destfile=paste0("Rank",d0$Rank[i],".png"),mode="wb")  
58:    Sys.sleep(0.1)  
59:   }  
60:  }  
61:    
62:    
63:  # Read in logo PNGs -------------------------------------------------------  
64:  for(i in 1:n) {  
65:   assign(paste0("Rank",d0$Rank[i]),readPNG(paste0("Rank",d0$Rank[i],".png")))  
66:  }  
67:  dims <- dim(Rank1)  
68:    
69:    
70:  # Combine arrays ----------------------------------------------------------  
71:  d1 <- vector()  
72:  for(i in 1:n) {  
73:   d1 <- c(d1,as.vector(get(ls()[grep(pattern="Rank.*",x=ls())][i])))  
74:  }  
-75:  a1 <- array(d1,c(dims,n))  
76:    
77:    
78:  # Clean up environment ----------------------------------------------------  
79:  rm(list=ls()[!ls() %in% c("d1","a1","n","dims")])  
80:    
81:    
82:  # Get element-wise metrics -----------------------------------------------  
83:  logo1.avg <- apply(a1,1:3,mean)  
84:  logo1.med <- apply(a1,1:3,median)  
85:  logo1.mod <- apply(a1,1:3,function(x) unique(x)[which.max(tabulate(match(x,unique(x))))])  
86:  logo1.ran <- apply(a1,1:3,sample,1)  
87:    
88:    
89:  # Display results ---------------------------------------------------------  
90:  grid.raster(logo1.avg)  
91:  dev.off()  
92:  grid.raster(logo1.med)  
93:  dev.off()  
94:  grid.raster(logo1.mod)  
95:  dev.off()  
96:  grid.raster(logo1.ran)  
97:  dev.off()  
98:    
99:    

Tuesday, September 15, 2015

cuRve stitching

Remember curve stitching from grade school? It makes for a nice tutorial for working with some common R functionality.

Here's an example of how to create the appearance of a parabola from plotting a series of straight lines:


pkg <- c("ggplot2","reshape2","RColorBrewer")
inst <- pkg %in% installed.packages()
if(length(pkg[!inst]) > 0) install.packages(pkg[!inst],repos="http://cran.rstudio.com/")
lapply(pkg,require,character.only=TRUE)
rm(list=c("pkg","inst"))


n <- 50
m1 <- data.frame(x=c(rep(0,n),seq(n,1,-1)),
                 y=c(seq(1,n,1),rep(0,n)),
                 group=rep(seq(1,n,1),2))


p1 <- ggplot(data=m1,aes(x=x,y=y,group=group,color=factor(group))) + 
  scale_color_manual(values=rep(brewer.pal(8,"Dark2"),times=n*2)) + 
  geom_line(size=1.0) + 
  theme(legend.position="none",
        axis.title=element_blank(),
        panel.grid=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank())
p1





Thursday, July 9, 2015

Top 2 Packages for Newly Hired Data Scientists



 library(NewCo knowledge)
function (X, FUN, ..., ) {FUN <-
                                Read the business wires +
                                Go to lunch with wide range of people +
                                Read the 10-K and maybe 10-Q +
                                Find a go-to source for “stupid questions”
                else Ignorant
}



library(credibility)
function (X, FUN, ..., ) {FUN <-
                                Double-check all assumptions +
                                Underpromise +
                                Save counterintuitive findings for last +
                                Find a potential advocate and find a project to help with
                else Ignored
}



Tuesday, March 17, 2015

Finding Similar European Soccer Clubs (with R & Shiny)

Are you a die-hard supporter of one European soccer (football) team (club)? Having a rough season, or just want to watch more matches with passion?

This European Team Finder analyzed 126 attributes of the top-flight teams in the marquee national leagues of Europe. Everything was considered, such as tackles, fouls, pass type, crosses, throw-ins, shots by body part, set pieces, and aerial battles.
  • England's Premier League
  • France's Ligue Un
  • German's Bundesliga
  • Italy's Serie A
  • Spain's La Liga
Just select a team you already like and the widget instantly displays the most similar team from each of the leagues, along with the least similar team in cause you love to hate (Spurs?).

Under the hood, similarity is quantified in R by creating a Euclidean distance matrix. The resulting matrix is displayed by loading it into Shiny, then accessing the closest/farthest distances for a selected club.

Raw data was sourced from the excellent website whoscored.com.

Wednesday, March 11, 2015

Tableau 9.0 Connects Directly to R Data Files

Tableau 9.0 will be released soon.

Tableau 8 already integrates with some R functionality, but 9.0 actually allows direct connection to R data files.

Tableau continues to remove friction between itself and R, further justifying its superior Gartner position.

Wednesday, February 11, 2015

R's Tricky == Operator, or "It depends on what the meaning of the word 'is' is"

One scenario where R can trip up a programmer is when using the == operator or its relatives. The help page notes that "NA values are regarded as non-comparable", which introduces some potentially unexpected behavior.

As a toy example, look what happens when trying to subset on a column that includes NA values.
df <- data.frame(a=11:15,b=c(3,NA,4,4,NA))
df
df[df$b==4,]
df[df$b<=4,]
In each case, rows with an NA in the b column are returned. This might be surprising and not obvious if wrapped inside of a an aggregation such as nrow or sum. A safer way to accomplish this subsetting is by using the %in% operator. Like so:
df[df$b %in% 4,]

SQL Interjections

When I have to re-learn how to use PARTITION BY, the interjections get more colorful