First Book of the Year

Last year I started off the year by making Ashlee Vance’s biography of Elon Musk the first book I read all year. I wanted to start 2016 off better than 2015 and thought this book might help my thinking. The story is on Musk is quite interesting if anything to simply show how much he believes in himself even when the odds seem stacked against him and the money in the bank grows lower. I tried to use Musk’s story to improve my own self-confidence and I felt like the most concrete way I was able to do so was to reduce the amount I took on and instead focus on doing a better job of what I had in front of me.

I will be repeating this little project in 2017 by starting the year off by reading Spread Spectrum:Hedy Lamarr and the mobile phone by Rob Walters. I know very little about spread spectrum technology and Hedy Lamarr had a very interesting life and is greatly underappreciated in modern society. Hopefully this book will prove to be as motivational over the course of the year in a similar manner as Musk’s biography was.

Election 2016

It has now been a month since the 2016 US Presidential election and I am still stunned by the outcome but am ready to move on.

The major issues I focused on while voting at the Presidential level were a better climate policy and more equal treatment for minorities and other marginalized populations. When I stop and think about why these were the major issues for me, I realize that I am pretty fortunate. I have a great job, generally feel safe, and am optimistic overall about the future and the economy.

The biggest realization for me was that although I care deeply about these issues on a national level, I need to be more involved at the community level.

After thinking about it, there are three ways I want to get more politically involved:

  1. Increase the amount of money I donate to specific organizations on a recurring basis.

  2. Get more involved with organizations that focus on climate advocacy and immigrant populations. I did do some volunteer work with CarbonWA and I want to get more involved with them as well as with an organization that focuses on immigrants such as ReWA

  3. Write more letters to elected officials about the issues I am most passionate about. I have helped make a github repo of all the boundaries of my hometown and I have never used this for any reason other than looking up addresses. At least I know where to look to figure out the various districts I live in.

Will these actions by me make a difference on the national level? Not likely but hard to say. What they will do for sure is to make an impact at the local level and will help me to improve the community around me. If these issues are important enough for me to write this post about, then it goes to show that they are important enough for me to get more involved with.

Has the Pac-12 Network Decreased UW Home Football Game Attendance?

The University of Washington Husky football team is taking on Rutgers this Saturday with kickoff at 11 AM PST. This is awfully early to start a game, especially a game that occurs during Labor Day weekend. The game is being aired on the Pac-12 Network which is about to enter its fifth year of operation. This made me wonder, with the presence of the Pac-12 Network, has attendance decreased at home UW football games?

Fortunately, Wikipedia lists game attendance which allows for a quick overview of UW home games stratified by network:

The purple dots are UW home games shown on the Pac-12 network, not entirely convincing but at first glance they don’t look too great for the network. I then looked at only home Pac-10/Pac-12 games and looked at attendance by season:

That high point in this figure is UW versus Oregon in 2013 while that particularly low point in 2015 was versus Arizona on Halloween which happened to fall on a Thursday in 2015. Why some executive at FS1 thought it would be a good idea to schedule a game then is beyond me.

In 2012, UW played its home games at CenturyLink Field, while Husky Stadium was renovated. For the 2013 season the UW football team returned to play at a smaller Husky Stadium, did either of these factors impact attendance?

Not really, there is a minimal difference in stadium size which is reflected in this identical-looking figure.

What Pac-12 opponents were the biggest draws on average?

Opponent Average Attendance
Oregon 69584
Washington State 68862
Colorado 64373
USC 64046
Oregon State 63777
Stanford 63360
UCLA 62544
California 62541
Arizona 60756

Obviously there are a lot of factors that can’t be captured by game attendance alone but with a significant budget deficit largely blamed on reduced attendance, it seems like it might be time for the UW to analyze if the Pac-12 Network has really been worth the investments to date.

Full code for scraping Wikipedia on GitHub

This American Life Stats

Lately I have been listening to episodes of This American Life faster than they are making them which means I have been going back to the archive for past unheard shows. Their website has a nice user section where you can log in and mark episodes you have heard and your favorites. The archives are arranged by year which naturally got me thinking about the number episodes I have listened to by year. A search of GitHub revealed many libraries for downloading episodes of the podcast but none that were interested in user statistics so I decided to write my own library. I am still very much beginner level with technologies such as passing cookies and CSRF requests which is why I ultimately ended up using Splinter which just lets you automate browser actions. I used that to login and navigate the TAL archives by year. I then used BeautifulSoup to parse the HTML. Finally, I just wanted to visualize the results so I used Mike Bostock’s D3 Bar Chart example.

Pretty basic but it gets the job done, full code here on GitHub

Slopeplots of African GDP With Ggplot2

I finally got around to reading Poor Numbers by Morton Jerven and found it really interesting. Basically Jerven argues that academic literature has either “neglected the issue of data quality and therefore accepted the data at face value or dismissed the data as unreliable and therefore irrelevant” and this causes many issues with the more data-driven approach to international aid in recent years.

The key table in this book was (in my opinion) a largely inscrutable table that showed African economies ranked by per capita GDP with data from three different sources of national income data: the World Development Indicators, Angus Maddison, and Penn World Tables. The differences in the rankings is hard to parse in a table but would theoretically lend themselves well to a slopegraph originally proposed by Edward Tufte in The Visual Display of Quantitative Information

Although not a true slopeplot, I was able to use a combination of geom_line from ggplot2 and the directlabels package to generate the following plot (which I will admit is a bit of a hack):

I was mainly interested in observing the variation in the top ten or so countries which this plot handles well. The remaining 35 or so countries are difficult to tell apart mostly due to very large differences in GDP. A log transformed plot shows that there is generally more consistency within the different rating agencies but some variation between them.

Slopegraphs are an effective and efficient way to visualize this type of data which is odd because I feel like they are rarely used and only barely mentioned in Tufte’s works. Hopefully more people being exposed to them will result in further usage.

Data from Table 1.1 from Poor Numbers and full code available at this gist

Changes in NPS Visits Over Time With D3

This year is the 100th anniversary of the National Park Service and I was curious about how park attendance had changed recently. Andrew Flowers of FiveThirtyEight had a nice overview using official NPS data. The article was interesting but contrary to most of FiveThirtyEight’s pieces there was a stunning lack of interactivity in what I felt was the main figure of the piece

I remade the figure using that same data but just focusong on 2006 until 2015 and basically the popular parks stay popular and vice versa:

Basically, Great Smoky Mountains NP has dominated attendance since its creation but what other parks have recently become popular? I used D3 to make a simple line plot that allowed for interactively exploring park attendance based on year over year change from the mean attendance over the last ten years:

Clearly a spike in recent years - maybe due to lower gas prices or increased popularity of social media?

Also, I was curious about what the trend looked like for the National Monuments.

The most immediate trend that jumps out is the effect of the closing of the Statue of Liberty in 2011 which indirectly caused a slowdown in visits to Castle Clinton.

I aim to visit some new parks and monuments this summer and hopefully looking at the data like this will help me avoid the crowds.

A National Parks Tour With Feather

This year is the 100th Anniversary of the National Park system and the National Park Service is kicking things off with National Park Week where every National Park will be open for free. With all the savings, my next thought was what would be the best way to visit all of them in a single trip?

To calculate this, I used a variant of the Concorde algorithm which is an algorithm for solving the Traveling Salesman Problem or what is the shortest route I can use to visit every National Park?

This also gave me an excellent opportunity to use Feather a way of writing dataframes to disk for interchanging between R and Python. I wanted to use R largely due to Michael Hahsler’s TSP library and I wanted to use python because of the ease of use of the Google Maps API client. Finally, I wanted to make a static map to show the route and I decided return to R and use the ggmap library.

I realize there are many ways to call R from Python and vice versa but I wanted to try feather. As a first attempt, I was pretty impressed with feather’s ease of use. I did not have too large of a dataset so I was unable to comment on the speed but simple reading and writing in R and Python is made to feel very simple. The one issue I did run into was more of a user issue and that it was challenging to rapidly flip back and forth between the two languages as I iterated this code. The 0-index of Python versus the 1-index of R is all handled by feather which is nice not to have to think about.

As a whole, I highly recommend checking out feather and as for me, well its time to hit the road and start visiting some National Parks and according to Google Maps I only have 14832.8 miles to go.

My code for this lives here

Super Bowl Sunday at Chuck’s

In the same vein as my previous post on beer sales analysis at Chuck’s Hop Shop, I wanted to make a similar analysis but this time focus on Super Bowl Sunday sales. Similar to last time, I made a few assumptions:

  • A keg is on tap until it is empty.
  • Each keg only serves pints of beer.
  • A pint is the only unit served (ie no 8 oz. pours).

Anyways, here is a brief summary of beers on tap for the shortest amount of time on Sunday:

Brewery Beer Hours on tap
Cloudburst Psycho Hose Beast… 0.25
Iron Fist Mint Chocolate Im… 1.00
Ballast Point Watermelon Dorado… 1.00
Wander Wanderale, Belgia… 1.25
Sound Humonkulous IIIPA 1.25
Cloudburst Saison W/Grapefru… 1.25
Victory Prima Pilsner 1.75
Commons Holden Saison 2.00
Seattle Cider Semi-Sweet Cider 3.00
Deschutes Abyss ‘15 ½ Pint 3.00

Time on tap vs. ABV

Did beer with higher ABV sell faster?

Time on tap vs. cost

Did more expensive beer sell faster?

What does the relation between cost per pint and ABV look like?

Once again, all code lives here

Thoughts on Pronto

Pronto bikeshare is in serious financial trouble and may not make it until the end of March of this year. There has been a lot of talk about the future of Pronto and funding but for now I just wanted to mention a few of the things I like about Pronto.

  • It is an excellent great for connecting mass transit and your final destination. Waiting for a bus transfer and then taking said transfer sometimes feels like it takes forever and hopping on a Pronto bike can make the trip dramatically faster.

  • It provides an ideal solution for just going somewhere and not worrying about your bike. Worried about locking up your bike in a certain area at a certain time? If there is a Pronto station nearby that problem can be easily fixed.

  • It is fun, full stop. The bikes are very sturdy and while they can feel a bit slow I never feel like they will have physical problems or break down. Yeah, I realize I look like a dork while riding it but then again I run my own blog, who am I to judge?

  • Finally, cars treat you as if you have never been on a bike before and give you significant leeway. On my commuter bike I often get buzzed by cars but on a Pronto I get treated like some tourist who has no clue what they are doing. From a safety standopint, thats pretty tough to beat.

I think that Pronto had a terrible rollout (starting a bikeshare program in October?) and as multiple people have shown, it does not have the best station placement compared to a similarly sized metropolitan area.

I have had multiple problems with bike docks and the helmet locker can sometimes be unresponsive. But, even after all that, I am long on Pronto and am constantly telling people about it. I had an entry in the Pronto Data Challenge. I was even planning on doing both the Emerald City Bike Ride and Obliteride on a Pronto bike just because I thought it would be fun.

Seattle is growing, really fast in fact, and giving people as many transit options as possible will only make it that much easier for people to move around. I really hope that Pronto lasts longer than the end of March and is eventually able to expand to cover more areas. As to whether the city should step in to save it or not, we’ll leave that exercise to the reader.

Weekend at Chuck’s

Chuck’s Hop Shop on 85th is a beer store with 40 beers on tap and hundreds if not thousands of bottled beers for sale. I have always been curious about what types of beers they go though fastest and what are some of the more popular breweries. Fortunately, they post their current tap list on their website which allowed me to look at their data over the course of the weekend.

I scraped their website every 15 minutes from opening Friday, January 29 - closing Sunday, January 30th. Their website also lists cost per pint, cost per growler, and ABV.

For the purposes of this analysis, I made a few assumptions:

  • A keg is on tap until it is empty.
  • Each keg only serves pints of beer.
  • A pint is the only unit served (ie no 8 oz. pours).

With these assumptions in mind, my first question was which beer goes fastest? The below table shows beers that were on tap for five hours or less:

Brewery Beer Minutes on tap
Bale Breaker Field 41 Pale 74.98333
Boneyard Hop Venom IIPA 74.98333
Almanac Elephant Heart de… 134.98333
Firestone Wookey Jack CDA 224.98333
pFriem Imperial IPA 240.00833
Sound Dubbel Entendre 270.00000
Bale Breaker Top Cutter IPA 277.48333
Kulshan Bastard Kat IPA 284.98333
Roslyn Brookside Pale Lager 284.98333
Breakside Vienna Coffee OG … 299.86667
Bale Breaker High Camp Winter … 300.00833

In a first place tie for a duration of a stunning 1 hour fifteen minutes were Bale Breaker Field 41 Pale and Hop Venom IIPA. There may be another explanation but with respective BeerAdvocate scores of 89 and 97, its easy to see why they are so popular.

Price of a pint vs. ABV

Is there a correlation between the price of a pint of beer and the ABV?

Drinking based on ABV

Do beers with higher ABV get ordered faster?

Are beers at Chuck’s veblen goods?

Do pricier beer move faster?

Obviously there are many more types of analysis one could look at with this data. I do think this analysis was sorely lacking in first person research, something I intend to fix in the next analysis. Full code on github.