Impact of Amazon Echo on Babies Named Alexa

A few years ago, there was an article in the Seattle Times about girls named Alexa post-introduction of the Amazon Echo. I was chatting with a friend of mine about this and we wondered if the introduction of the Amazon Echo has lead to a reduction in girls named Alexa. According to Wikipedia, the Amazon Echo was first introduced on November 6, 2014 and Hadley Wickham was kind enough to organize an R package of baby names as recorded by the United States SSA. Using this data shows a steep decline in the number of female babies named Alexa which may be due to a variety of factors:

In the process of making this first plot I realized that there are boys in the SSA data named Alexa as well, lets see what the data for boys looks like:

What Is the Most Remote Airport From the City It Serves in the United States?

Recently, there was a question on travel.SE about locating the airport furthest from the city it is supposed to serve. There are some interesing answers including the winner from Paris where the airport is approximately 147 km away from the Paris metro area. I started to wonder about this question for different airports within the US and I stumbled onto this Wikipedia page which served as a good baseline for a quick analysis.

I used the Google Maps location API to calculate latitude and longitude for the center of the town or city and location of the airport. I then used the Google Maps directions API to calculate the driving distance between the airport and the center of the town it serves. This lead to some interesting edge cases. For example, Peach Springs, Arizona is on this list and the airport is about 113 miles away:

Peach Springs is a Census Designated Place and largely serves the Hualapai tribe. Should it have been included on this list?

I also noticed on this Wikipedia page that there are many airports with split locations (Sea/Tac Airport for example) or a single airport serving multiple locations such as Harrisburg International Airport serving Harrisburg/Middletown, PA). For these cases I just used the first city mentioned for this analysis. The Wikipedia article also lists enplanements as recorded by the FAA in 2015 which provides a useful metric for comparison. First I looked at distance to airport versus number of passengers:

Not too surprising to observe that many of the airports have very few emboardings each year and are reasonably close to the center of town.

Then I looked at only airports that had over one million enplanements which narrowed my list of airports down to 27:

To me it is interesting to note that San Diego, Boston and to some extent Dulles are all close to the city center with relatively few emboardings as compared to this subset. Also, I grew up in Fort Collins, CO and have many fond memories as a child of driving across the plains with Denver International Airport seeming so, so far away.

As always, full code available on GitHub

Changes in Voter Turnout Between the 2014 and 2018 US Elections

As I watched the livestream of the 2018 US midterm election results, I was absolutely stunned at the significant increase in voter turnout over the 2014 US midterm election. Now that almost all of the 2018 election results have been certified by their repective Secretaries of State, I wanted to take a look at how this increase in voter turnout manifested on a state by state basis.

To make things as simple as I could, I primarily used the data from two New York Times elections pages: the 2014 results page and the 2018 results page.

I realize this data may not be complete as some of the voter counts are not fully reported for all precincts on these pages. However, the total vote count from the 2014 data is 72,031,124 while the total vote count from 2018 is 106,385,810 which I felt was accurate enough for the purposes of this analysis.

In both elections, I primarily focused on the House of Representatives because that was the only office for universally up for election. As I embarked on this project I soon realized that a direct comparison would not be possible after Pennslyvania re-drew its congressional maps in early 2018 and Florida did so as well in a 2016 redistricting.

This first map simply shows congressional districts where the voter turnout increased. The congressional districts colored grey have either a decline in voter turnout, or where the candidate ran uncontested in either 2014 or 2018 (or both) and therefore do not have a difference in percentage to measure.

Explorable version here

The second choropleth map shows states that increased in voter turnout as blue, states that decreased in voter turnout as red while those colored grey had at least one uncontested election.

Explorable version of this map here

It is interesting to note that only seven US Congressional districts had decreases in voter turnout from 2014 to 2018. Table of districts with decreased voter turnout:

District 2014 votes 2018 votes Difference
IL-09 203946 91476 -55.14
CO-01 266021 256542 -3.56
PA-02 202635 197495 -2.53
AK-00 242844 238131 -1.94
IL-07 171502 170290 -0.70
AR-04 205066 204113 -0.46
KY-05 218697 218324 -0.17

The district with the highest increase in turnout? That would be CA-34

Finally I grouped all the votes together by state to make the following choropleth of US House votes on a state level:

Explorable version of this map here

A Visual Comparison of Votes for Two Carbon Tax Initiatives

Washington State voters were presented two different carbon tax initiatives in the General Elections of 2016 and 2018. A full comparison of both proposals is here. While neither passed, I was curious how the Yes vote looked for both Initiatives.

Map of percentage of Yes votes for Initiative 732 (General Election 2016), hover mouse cursor over county for exact Yes percentage.

Map of percentage of Yes votes for Initiative 1631 (General Election 2018), hover mouse cursor over county for exact Yes percentage.

Overall net change in Yes votes from Initiative 732 to Initiative 1631 as a percentage of the whole by county. The more red counties are where Initiative 732 was favored more while the more blue counties are where Initiative 1631 was favored more. Hover mouse cursor over county for net difference between Yes votes for both initiatives.

A Quick Look at the Seattle Mariners 2018 Attendance

On September 17, 2018, the King County council voted 5-4 to allow for a new funding agreement between King County and sports stadiums such as Safeco Field. There have already been challenges to the legislation and it may appear as a petition on the ballot soon.

The 2018 season was unusually successful for the Mariners, and while they are again sitting out the Playoffs this season I did wonder what attendance looked like this year.

I added two lines to this plot, one for maximum attendance (currently 47,715 according to Wikipedia) and one for mean attendance which was 28,389 this year. This means that we have a stadium that is on average about 60% full for any game for a team that competed for a Playoff spot until the very end of the season. Is that really the best use of this money?

Sentiment Analysis of Candidate Statements by Senate Candidates

The 2018 Washington State Primary was held on August 7, 2018. As a registered voter in Washington State I am mailed a Voter’s Information Pamphlet which lists the candidates and a Candidate Statement (provided by the candidate) for each office. The Candidate Statement is where the candidate is allowed to write anything they want as long as its under 300 words. I was curious if there was any relationship between the sentiment of a candidate’s Statement and the number of votes that candidate received.

I conducted my analysis using sentiment analysis which groups words together based on pre-defined lists of words that are members of that group. There are many word groups to choose from such as “joy” and “trust” but for this analysis I just looked at “positive” and “negative” words (as classified by NRC).

Although there were many different offices up for election in this Primary, the U.S. Senate race had 29 candidates which made for a very rich dataset. Washington State uses a Top-Two Primary which allows for easy comparison across political parties.

There are many factors deliberately ignored by this analysis such as PVI, incumbency, fundraising, political party and candidate issues. However I thought text mining could be an interesting way to analyze these candidates in a slightly different manner.

First I just plotted the number of positive words in the Candidate Statement for each candidate:

Then I repeated with the number of negative words in the Candidate Statement for each candidate:

Next I looked at the number of positive words in the Candidate Statement by candidate versus the number of votes that candidate received. Because of the strength of the Democratic incumbent candidate Maria Cantwell and the Republican establishment candidate Susan Hutchison I had to log-transform the vote counts because these two candidates got so much of the total vote.

Then I repeated this analysis by looking at the negative word count for each candidate versus the total log-transformed vote count:

While I don’t think this will lead to any novel political insights, I do think its an interesting way to look at candidates. Full code including individual candidate statements available here

Should Seattle Convert Public Golf Courses Into Land for Housing?

Like many other metropolitan areas, Seattle is currently dealing with a serious housing shortage. Recently, there have been some good articles (here and here) about converting public golf courses into housing. It is an interesting concept and certainly should be discussed, however one aspect I feel is being overlooked in this debate is how popular are these golf courses anyways? According to Bloomberg News, there is declining interest in golf nationwide, is this happening in Seattle?

The City of Seattle is somewhat aware of this issue and is at least discussing options for the golf courses. I filed a Public Records Request act with the City of Seattle and they sent me data on the rounds of golf played for the past three years.

Unfortunately the city only provided the count data for number of rounds of golf played per year so I cannot look at more granular trends. However, I can plot the counts on an annual basis.

It will be interesting to see if this debate both on converting public golf courses to housing goes anywhere both in Seattle and other major cities. I don’t have a horse in this particular race, I live near the Jefferson Park course and I really appreciate the large expanse of green it provides however I am acutely aware of the need for more housing within Seattle city limits.

Interested in taking a look? I put the full data I got from the city here.

A Visual Ranking of Seattle Public Elementary Schools

One of the things people consistently tell me when they are considering buying or renting a home in a new location is that they want to move somewhere with “good schools.” This always makes me wonder how we quantify what schools are considered “good”? To start you might look for information on school reputations or performance using local school data fact sheets from realtors or apartment managers or, more likely, by searching ranking and review sites such as Niche. This approach is generally fine but it assumes that you are only interested in a specific neighborhood which may be difficult to achieve right now in most major cities across the US and especially in Seattle.

What if instead we looked at all the neighborhood school ratings in a city simultaneously? This would allow the reader to spot trends and make visual comparisons as well as possibly identify overperforming schools in unexpected areas. Fortunately, Seattle Public Schools (SPS) provides quite a lot of data about their schools which makes this easy to visualize.


  • For this analysis I just the SPS data for the 2016-17 school year.

  • I focused solely on elementary school data for the 2016-17 school year. I used the SPS district boundary map for all public elementary schools in the City of Seattle and ignored any magnet or alternative elementary schools.


Initially I focused on three questions:

  • What school has the best student/teacher ratio?
  • What school reports the best attendance?
  • What schools are best for reading and math?

I took the SPS data for student/teacher ratio, attendance rate, and reading and math proficiency scores for each school and calculated their rank within the city to make this table. Click on the category name to sort by that category.

School Name Attendance Rank Student/Teacher Ratio Rank Grade 3 Math Rank Grade 3 Reading Rank
Adams 12 53 29 21
Alki 20 35 12 15
Arbor Heights 21 31 45 33
Gatzert 42 4 49 56
Beacon Hill Int’l 1 32.50 30 37
B.F. Day 16 43 25 23
Broadview-Thomson K-8 49 2 46 45
Bryant 6 55 3 4
Cascadia 2 59 1 1
Catharine Blaine K-8 38 15 6 13
Concord Int’l 44 22 59 60
Bagley 10 20 24 19
Dearborn Park Int’l 61 61 61 61
Dunlap 43 5 56 55
Emerson 58 14 43 53
Fairmount Park 22 47 2 3
Coe 9 48 10 9
Gatewood 35 26 33 40
Genesee Hill 24 50 16 20
Graham Hill 39 10 55 49
Green Lake 31 37 26 28
Greenwood 15 54 20 17
Hawthorne 46 16 48 41
Highland Park 48 6 57 50
Hay 8 38 19 10
John Muir 23 25 52 48
John Rogers 33 24 34 34
Kimball 30 32.50 40 36
Lafayette 29 44 27 25
Laurelhurst 34 49 15 24
Lawton 26 42 4 6
Leschi 53 36 37 44
Lowell 60 3 58 54
Loyal Heights 17 58 5 7
Madrona 52 1 47 43
Maple 19 28 31 32
MLK Jr. 45 8 54 58
McDonald International 7 30 23 2
McGilvra 25 40 14 11
Montlake 37 51 9 16
North Beach 14 46 13 12
Northgate 51 11 50 51
Olympic Hills 41 34 21 30
Olympic View 27 56 32 29
Queen Anne 28 57 28 22
Rainier View 55 21 18 26
Roxhill 56 13 53 57
Sacajawea 36 7 42 42
Sand Point 47 27 39 35
Sanislo 57 12 60 59
Stevens 32 29 35 31
Thornton Creek 18 17 41 38
Thurgood Marshall 5 39 8 18
Van Asselt 59 9 51 52
Viewlands 40 19 44 47
View Ridge 3 52 11 5
Wedgwood 11 41 7 14
West Seattle Elem 54 23 38 46
West Woodland 13 45 17 8
Whittier 4 60 22 27
Wing Luke 50 18 36 39
Schmitz Park 62 62 62 62

What jumps out at me most is that no particular school consistently out-performs the others, which can make it challenging to decide what to prioritize when choosing a school.

Student/Teacher ratio

Each school reports the number of enrolled students and the number of teachers which I simply used to calculate a ratio.

Click on an attendance area for the exact percentage.


I was initially interested in student attendance data, but the elementary school with the lowest daily attendance was Lowell Elementary with an attendance rate of 89%. Every other school reported an attendance rate at or above 95% which did not make for a very interesting map. I later learned that Washington State has a compulsory attendance law which likely affected these numbers.

Reading proficiency

I was interested in looking at district-wide third grade reading achievement scores district-wide for 3rd graders as measured by the Washington State proficiency test. I chose third grade because that is the first year a Washington State standardized test is administered for reading.

Click on an attendance area for the exact percentage.

Math proficiency

Similarly, I looked at the district-wide third grade math achievement scores as measured by the Washington State proficiency test

Click on an attendance area for the exact percentage.

Family engagement

SPS provides a parent survey with a variety of questions evaluating parent enthusiasm and approval of Seattle schools. These survey results are not published, so I looked at how many families completed these surveys for the 2016-17 school year.

Click on an attendance area for the exact percentage.

Even the school with the most responses reported that only 49.1% of families responded to the survey which to me means that most families are satisfied with their school but neither especially excited or disappointed by their school experiences.

tl;dr Choosing a school is hard but ultimately it comes down to how satisfied the parents or guardians are with the school. Schools report on a wide array of metrics about student performance, but performance is often an issue of secondary importance when compared to parents’ overall perception of the school quality.

Visualizing Flight Data for the 2017 Seattle Mariners

Remember this map that Facebook created of friend connections back in 2011?

I thought it was pretty cool back then and I still think its pretty cool. I wanted to make a similar map but was not sure where to start. I could have done a similar visualization however I recently quit Facebook so I can no longer export all my friend’s data to use for making maps. My next thought was visualizing travel routes such as flight information. I am trying to reduce my carbon footprint which meant I only flew five times in 2017 and have flown exactly zero times so far in 2018. Then I thought, you know who does fly alot? The Seattle Mariners.

First step was to collect all the Mariners game data, fortunately Baseball Reference has all that data in an easily accessible HTML table.

Next step was to geolocate all the stadiums which can be a bit tedious. Fortunately GitHub user the55 created a nice JSON file of all the stadiums and put it as a gist. I was able to use an R library called geosphere for using the Haversine formula to calculate the distance between two stadiums.

My initial attempt here:

In order to make the image look similar to the Facebook connection map, I ended up using this Flowing Data post quite a bit to figure out how to add the lines and change the background color:

Finally because there were so many trips from Seattle to American League West opponents that I ended up adding a bit of noise or jitter to the stadium locations to make the flight paths not perfectly overlap each other.

Looking back at this 2017 reminded me the Mariners finished 78-84 in 2017, here’s hoping to a better season in 2018!

If interested, I put all the code for this analysis here

Further Analysis of the 2017-18 WA State Legislature

This is my second post looking at the data from the 2017-18 Washington State Legislative Session. the first part of this blog can be read here

After some time looking at different bills that did pass, I started to wonder if a bill was more likely to pass if it had more sponsors. First I took the 647 bills passed by the Legislation and signed into law by Governor and looked up how many co-sponsors each bill had:

Then I I took every bill that was introduced but did not become law and counted up the sponsors for these:

So it appears that the number of sponsors is not particulary predictive for a bill becoming law. The three bills introduced in the Senate with the highest number of Sponsors were:

Bill Sponsor count Summary
5598 40 Granting relatives, including but not limited to grandparents, the right to seek visitation with a child through the courts.
6037 28 Concerning the uniform parentage act.
5375 27 Renaming the cancer research endowment authority to the Andy Hill cancer research endowment.

And in the House:

Bill Sponsor count Summary
2282 52 Protecting an open internet in Washington state.
1714 45 Concerning nursing staffing practices at hospitals.
1400 42 Creating Washington state aviation special license plates.

In November 2017, Manka Dhingra won a special election and the Washington State Senate flipped from Republican held to Democrat held. Initially I wanted to focus on the number of bills passed by a Republican held Senate versus a Democrat held Senate but there were too many extraneous variables such as passing a budget and a shorter session in 2018. Instead, I decided to focus on the number of Yea votes by bill

Many of the bills passed were with almost overwhelming support, which is refreshing to see that there is quite a bit of bipartisanship in Washington State in 2018.

As always, analysis code on GitHub