A Quick Look at the Seattle Mariners 2018 Attendance

On September 17, 2018, the King County council voted 5-4 to allow for a new funding agreement between King County and sports stadiums such as Safeco Field. There have already been challenges to the legislation and it may appear as a petition on the ballot soon.

The 2018 season was unusually successful for the Mariners, and while they are again sitting out the Playoffs this season I did wonder what attendance looked like this year.

I added two lines to this plot, one for maximum attendance (currently 47,715 according to Wikipedia) and one for mean attendance which was 28,389 this year. This means that we have a stadium that is on average about 60% full for any game for a team that competed for a Playoff spot until the very end of the season. Is that really the best use of this money?

mariners, r,

Sentiment Analysis of Candidate Statements by Senate Candidates

The 2018 Washington State Primary was held on August 7, 2018. As a registered voter in Washington State I am mailed a Voter’s Information Pamphlet which lists the candidates and a Candidate Statement (provided by the candidate) for each office. The Candidate Statement is where the candidate is allowed to write anything they want as long as its under 300 words. I was curious if there was any relationship between the sentiment of a candidate’s Statement and the number of votes that candidate received.

I conducted my analysis using sentiment analysis which groups words together based on pre-defined lists of words that are members of that group. There are many word groups to choose from such as “joy” and “trust” but for this analysis I just looked at “positive” and “negative” words (as classified by NRC).

Although there were many different offices up for election in this Primary, the U.S. Senate race had 29 candidates which made for a very rich dataset. Washington State uses a Top-Two Primary which allows for easy comparison across political parties.

There are many factors deliberately ignored by this analysis such as PVI, incumbency, fundraising, political party and candidate issues. However I thought text mining could be an interesting way to analyze these candidates in a slightly different manner.

First I just plotted the number of positive words in the Candidate Statement for each candidate:

Then I repeated with the number of negative words in the Candidate Statement for each candidate:

Next I looked at the number of positive words in the Candidate Statement by candidate versus the number of votes that candidate received. Because of the strength of the Democratic incumbent candidate Maria Cantwell and the Republican establishment candidate Susan Hutchison I had to log-transform the vote counts because these two candidates got so much of the total vote.

Then I repeated this analysis by looking at the negative word count for each candidate versus the total log-transformed vote count:

While I don’t think this will lead to any novel political insights, I do think its an interesting way to look at candidates. Full code including individual candidate statements available here

politics, r,

Should Seattle Convert Public Golf Courses Into Land for Housing?

Like many other metropolitan areas, Seattle is currently dealing with a serious housing shortage. Recently, there have been some good articles (here and here) about converting public golf courses into housing. It is an interesting concept and certainly should be discussed, however one aspect I feel is being overlooked in this debate is how popular are these golf courses anyways? According to Bloomberg News, there is declining interest in golf nationwide, is this happening in Seattle?

The City of Seattle is somewhat aware of this issue and is at least discussing options for the golf courses. I filed a Public Records Request act with the City of Seattle and they sent me data on the rounds of golf played for the past three years.

Unfortunately the city only provided the count data for number of rounds of golf played per year so I cannot look at more granular trends. However, I can plot the counts on an annual basis.

It will be interesting to see if this debate both on converting public golf courses to housing goes anywhere both in Seattle and other major cities. I don’t have a horse in this particular race, I live near the Jefferson Park course and I really appreciate the large expanse of green it provides however I am acutely aware of the need for more housing within Seattle city limits.

Interested in taking a look? I put the full data I got from the city here.

seattle,golf

A Visual Ranking of Seattle Public Elementary Schools

One of the things people consistently tell me when they are considering buying or renting a home in a new location is that they want to move somewhere with “good schools.” This always makes me wonder how we quantify what schools are considered “good”? To start you might look for information on school reputations or performance using local school data fact sheets from realtors or apartment managers or, more likely, by searching ranking and review sites such as Niche. This approach is generally fine but it assumes that you are only interested in a specific neighborhood which may be difficult to achieve right now in most major cities across the US and especially in Seattle.

What if instead we looked at all the neighborhood school ratings in a city simultaneously? This would allow the reader to spot trends and make visual comparisons as well as possibly identify overperforming schools in unexpected areas. Fortunately, Seattle Public Schools (SPS) provides quite a lot of data about their schools which makes this easy to visualize.

Setup

For this analysis I just the SPS data for the 2016-17 school year.
I focused solely on elementary school data for the 2016-17 school year. I used the SPS district boundary map for all public elementary schools in the City of Seattle and ignored any magnet or alternative elementary schools.

Rankings

Initially I focused on three questions:

What school has the best student/teacher ratio?
What school reports the best attendance?
What schools are best for reading and math?

I took the SPS data for student/teacher ratio, attendance rate, and reading and math proficiency scores for each school and calculated their rank within the city to make this table. Click on the category name to sort by that category.

School Name	Attendance Rank	Student/Teacher Ratio Rank	Grade 3 Math Rank	Grade 3 Reading Rank
Adams	12	53	29	21
Alki	20	35	12	15
Arbor Heights	21	31	45	33
Gatzert	42	4	49	56
Beacon Hill Int’l	1	32.50	30	37
B.F. Day	16	43	25	23
Broadview-Thomson K-8	49	2	46	45
Bryant	6	55	3	4
Cascadia	2	59	1	1
Catharine Blaine K-8	38	15	6	13
Concord Int’l	44	22	59	60
Bagley	10	20	24	19
Dearborn Park Int’l	61	61	61	61
Dunlap	43	5	56	55
Emerson	58	14	43	53
Fairmount Park	22	47	2	3
Coe	9	48	10	9
Gatewood	35	26	33	40
Genesee Hill	24	50	16	20
Graham Hill	39	10	55	49
Green Lake	31	37	26	28
Greenwood	15	54	20	17
Hawthorne	46	16	48	41
Highland Park	48	6	57	50
Hay	8	38	19	10
John Muir	23	25	52	48
John Rogers	33	24	34	34
Kimball	30	32.50	40	36
Lafayette	29	44	27	25
Laurelhurst	34	49	15	24
Lawton	26	42	4	6
Leschi	53	36	37	44
Lowell	60	3	58	54
Loyal Heights	17	58	5	7
Madrona	52	1	47	43
Maple	19	28	31	32
MLK Jr.	45	8	54	58
McDonald International	7	30	23	2
McGilvra	25	40	14	11
Montlake	37	51	9	16
North Beach	14	46	13	12
Northgate	51	11	50	51
Olympic Hills	41	34	21	30
Olympic View	27	56	32	29
Queen Anne	28	57	28	22
Rainier View	55	21	18	26
Roxhill	56	13	53	57
Sacajawea	36	7	42	42
Sand Point	47	27	39	35
Sanislo	57	12	60	59
Stevens	32	29	35	31
Thornton Creek	18	17	41	38
Thurgood Marshall	5	39	8	18
Van Asselt	59	9	51	52
Viewlands	40	19	44	47
View Ridge	3	52	11	5
Wedgwood	11	41	7	14
West Seattle Elem	54	23	38	46
West Woodland	13	45	17	8
Whittier	4	60	22	27
Wing Luke	50	18	36	39
Schmitz Park	62	62	62	62

What jumps out at me most is that no particular school consistently out-performs the others, which can make it challenging to decide what to prioritize when choosing a school.

Student/Teacher ratio

Each school reports the number of enrolled students and the number of teachers which I simply used to calculate a ratio.

Click on an attendance area for the exact percentage.

Attendance

I was initially interested in student attendance data, but the elementary school with the lowest daily attendance was Lowell Elementary with an attendance rate of 89%. Every other school reported an attendance rate at or above 95% which did not make for a very interesting map. I later learned that Washington State has a compulsory attendance law which likely affected these numbers.

Reading proficiency

I was interested in looking at district-wide third grade reading achievement scores district-wide for 3rd graders as measured by the Washington State proficiency test. I chose third grade because that is the first year a Washington State standardized test is administered for reading.

Click on an attendance area for the exact percentage.

Math proficiency

Similarly, I looked at the district-wide third grade math achievement scores as measured by the Washington State proficiency test

Click on an attendance area for the exact percentage.

Family engagement

SPS provides a parent survey with a variety of questions evaluating parent enthusiasm and approval of Seattle schools. These survey results are not published, so I looked at how many families completed these surveys for the 2016-17 school year.

Click on an attendance area for the exact percentage.

Even the school with the most responses reported that only 49.1% of families responded to the survey which to me means that most families are satisfied with their school but neither especially excited or disappointed by their school experiences.

tl;dr Choosing a school is hard but ultimately it comes down to how satisfied the parents or guardians are with the school. Schools report on a wide array of metrics about student performance, but performance is often an issue of secondary importance when compared to parents’ overall perception of the school quality.

schools, seattle,

Visualizing Flight Data for the 2017 Seattle Mariners

Remember this map that Facebook created of friend connections back in 2011?

I thought it was pretty cool back then and I still think its pretty cool. I wanted to make a similar map but was not sure where to start. I could have done a similar visualization however I recently quit Facebook so I can no longer export all my friend’s data to use for making maps. My next thought was visualizing travel routes such as flight information. I am trying to reduce my carbon footprint which meant I only flew five times in 2017 and have flown exactly zero times so far in 2018. Then I thought, you know who does fly alot? The Seattle Mariners.

First step was to collect all the Mariners game data, fortunately Baseball Reference has all that data in an easily accessible HTML table.

Next step was to geolocate all the stadiums which can be a bit tedious. Fortunately GitHub user the55 created a nice JSON file of all the stadiums and put it as a gist. I was able to use an R library called geosphere for using the Haversine formula to calculate the distance between two stadiums.

My initial attempt here:

In order to make the image look similar to the Facebook connection map, I ended up using this Flowing Data post quite a bit to figure out how to add the lines and change the background color:

Finally because there were so many trips from Seattle to American League West opponents that I ended up adding a bit of noise or jitter to the stadium locations to make the flight paths not perfectly overlap each other.

Looking back at this 2017 reminded me the Mariners finished 78-84 in 2017, here’s hoping to a better season in 2018!

If interested, I put all the code for this analysis here

baseball, r,

Further Analysis of the 2017-18 WA State Legislature

This is my second post looking at the data from the 2017-18 Washington State Legislative Session. the first part of this blog can be read here

After some time looking at different bills that did pass, I started to wonder if a bill was more likely to pass if it had more sponsors. First I took the 647 bills passed by the Legislation and signed into law by Governor and looked up how many co-sponsors each bill had:

Then I I took every bill that was introduced but did not become law and counted up the sponsors for these:

So it appears that the number of sponsors is not particulary predictive for a bill becoming law. The three bills introduced in the Senate with the highest number of Sponsors were:

Bill	Sponsor count	Summary
5598	40	Granting relatives, including but not limited to grandparents, the right to seek visitation with a child through the courts.
6037	28	Concerning the uniform parentage act.
5375	27	Renaming the cancer research endowment authority to the Andy Hill cancer research endowment.

And in the House:

Bill	Sponsor count	Summary
2282	52	Protecting an open internet in Washington state.
1714	45	Concerning nursing staffing practices at hospitals.
1400	42	Creating Washington state aviation special license plates.

In November 2017, Manka Dhingra won a special election and the Washington State Senate flipped from Republican held to Democrat held. Initially I wanted to focus on the number of bills passed by a Republican held Senate versus a Democrat held Senate but there were too many extraneous variables such as passing a budget and a shorter session in 2018. Instead, I decided to focus on the number of Yea votes by bill

Many of the bills passed were with almost overwhelming support, which is refreshing to see that there is quite a bit of bipartisanship in Washington State in 2018.

As always, analysis code on GitHub

legislature, r,

Visualizing the 2017-18 WA State Legislature

In his 2018 State of the State speech, Washington State Governor Jay Inslee made a passioned appeal for a carbon tax and proposed one in Washington State Senate bill 6203. Because of this, I paid more attention to the activities of the Washington State Legislature than I ever had before and I found it fascinating.

First off, lets start with the website for the state Legislature. Here is a screenshot of the Washington State Legislature page for SB 6203 which is the bill I was most interested in:

The website is very resource dense and well worth time exploring when the Legislature is in session. Every piece of proposed legislation shows the same amount of information and allows you to easily find and contact your legislators about a particular bill if interested. The site also has livestreams of committee hearings and displays vote counts on bills in almost real time as the votes are tallied on both the Senate and the House floor.

Is Washington State unique in this regard? Of course not, here is a screen shot for an interesting bill in Legislature for the State of California.

Finally here is a screenshot of a House bill on the United States Congress website

Does ease of use of the website increase participation in the civic process at the state level? That is a difficult question to answer but personally I am glad I get to use the Washington State one instead of the California State Legislature webpage.

The 2017-18 Washington State Legislative Session ended on March 8, 2018 and Governor Inslee then had 21 days to sign bills into law or veto them.

The conclusion of the 2017-18 Session made me wonder what happened to those bills that were introduced and how many of them actually became law. In addition to a great website, the Washington State Legislature also has an excellent set of Web Services that allow for programmatically capturing metrics and data about activities in the state legislation. One way to easily visualize this is with a Sankey Diagram (no relation to this Sankey though).

Here is a smaller image of the diagram with a larger version here

Code to generate this figure available on my GitHub repo

d3, legislature,, r,

Has the Pac-12 Network Decreased UW Home Football Game Attendance UPDATED

Following up on my earlier post, how much has the Pac-12 Network affected game attendance? I updated my previous data set to include the past two seasons so as to include 2008-2017 data. I relied on home game attendance as reported by Wikipedia and also used Wikipedia to determine what TV network broadcast each home game. In an ideal world I would be able to make better comparisons using the Nielsen rating for each game however my guess is that data does not come as cheap or as easily as data from Wikipedia. For the purposes of this analysis I am neglecting various other factors in this anaysis such as time at kickoff, game day temperature, opponent, ranking of UW, ranking of opponent, etc… the list goes on and on. My main intention was to simply show home game attendance versus TV network for all games:

And attendance for Pac-12 only opponents versus TV network:

Based on the available data it appears that attendance during home games has been influenced and possibly decreased by the Pac-12 Network but it is difficult to say for sure while ignoring so many external factors. With a significant budget deficit still a major issue, one can only hope that losses from game day ticket sales are made up for with Pac-12 Network advertising revenue.

football, r,

States With Multiple Football Teams in the AP Top 25

With WSU beating Oregon and UW beating UC Berkeley, the State of Washington is poised to have two football teams in the top ten of NCAA Division I football rankings. Naturally this got me thinking, how often does this happen and how many states have had this same achievement?

To answer this I used the weekly results of Associated Press poll which started in 1936 and thanks to our good friends at Wikipedia, I was able to get AP Poll results for every week.

I found that 25 states had at least one week where two teams from that state were in the AP Poll. However, the more I thought about it the more I realized this was slightly biased because some states might only have one team (i.e. Wyoming) while other states might have two Division I teams that are never both great at the same time (i.e. Montana). I tightened down my restrictions a bit and only looked at the top 10 teams from each AP Poll.

Surprisingly, of the 25 states with at least two teams in the AP top 25 Poll, 21 of those states had a week with at least two teams from that state in the AP top 10. I made a summary table with the most recent year each state achieved this distinction listed:

state	year
Louisiana	1936
Maryland	1955
North Carolina	1957
New York	1958
Illinois	1963
Indiana	1979
Pennsylvania	1982
Colorado	1994
Kansas	1995
Washington	1997
Ohio	2009
Oregon	2012
Florida	2013
South Carolina	2013
Georgia	2014
Mississippi	2014
California	2015
Alabama	2016
Michigan	2016
Texas	2016
Oklahoma	2017

Then, I thought what if there were ever a week when a state had 3 teams in the AP top 10. Sure enough, four states have achieved this:

state	year
California	1952
Indiana	1967
Florida	2005
Texas	2015

As always, all of my code for this is on GitHub

football, r,, wikipedia,

Further Exploration of IMDb TV Show Rating Data

I wanted to revist my previous post continuing to look at using linear regression for determining the best episodes of a TV show to watch. I started to think about how to look at this data for multiple TV shows. Performing a linear regression on show rating by episode number within a season quickly allows us to determine the maximum and minimum residual for all the show episodes. I took this a step further and calculated which episode of the show it was. For example, here are all the episodes with residual value for that particular show Master of None:

Season	Episode	Name	Residual	count	appearance
1	1	Plan B	-0.28	1	0.05
1	2	Parents	0.21	2	0.1
1	3	Hot Ticket	0.01	3	0.15
1	4	Indians on TV	0.21	4	0.2
1	5	The Other Man	-0.09	5	0.25
1	6	Nashville	0.31	6	0.3
1	7	Ladies and Gentlemen	-0.39	7	0.35
1	8	Old People	-0.09	8	0.4
1	9	Mornings	0.11	9	0.45
1	10	Finale	0.01	10	0.5
2	1	The Thief	0.44	11	0.55
2	2	Le Nozze	-0.36	12	0.6
2	3	Religion	-0.27	13	0.65
2	4	First Date	0.13	14	0.7
2	5	The Dinner Party	0.02	15	0.75
2	6	New York, I Love You	0.42	16	0.8
2	7	Door #3	-0.89	17	0.85
2	8	Thanksgiving	0.31	18	0.9
2	9	Amarsi Un Po	0.30	19	0.95
2	10	Buona Notte	-0.10	20	1

We can see that the episode with the highest residual is S2E1 “The Thief” and the episode with the lowest residual is S2E7 “Door #3”. For every TV show I took all the episodes and calculated their order as a percent of the total number of episodes - for example the pilot episode would be 0.0 and the series finale would be 1.0 to generate an index. I then took the maximum and minimum residual values for each show and plotted them against that episode. For example here is a plot of just Master of None:

To obtain data on as many shows as I could I used this IMDb list of shows with over 5000 votes and selected the first 1200 shows as a dataset. I then reused the OMDb API as I did before. I then calculated the same values as I did for Master of None above and plotted them in a similar manner (use the mouseover for more information on each point):

Two things immediately jump out at me:

The density of points right around the zero line shows that linear regression is a pretty good metric to use for this type of analysis and that most people rate the show generally in line with the overall trend for that particular season.
There seems to be a tendancy for people to really love or really hate the series finale of TV shows and this shows up by the sheer number of points at 1. Possibly this is people expressing their overall view of the show as a whole or maybe people really were really happy or unhappy with the series finale.

I put some of the main code I used in a GitHub repository

r,python,imdb