Mapping Seattle Traffic Circles

Recently, there was a post on Priceonomics about traffic circles with the argument that roundabouts are safer, help improve traffic flow, and reduce emissions. The post mentions that there are 3700 roundabouts in the United States, which made me wonder - how many of those are in Seattle?

Fortunately, the City of Seattle data site has GIS data for all streets as well as GIS data for all traffic circles. The traffic circles dataset had 1042 total entries or about a third of the total number of roundabouts in the United States. I’m not sure how accurate that is but I think focusing on traffic circles within the city limits is more interesting.

I used the sp library in R to read in both sets of shapefiles and quickly determine the streets with the highest number of roundabouts:

street count
FREMONT AVE N 27
1ST AVE NW 24
8TH AVE NE 23
DAYTON AVE N 23
6TH AVE NW 21
12TH AVE NE 18

Fremont Ave N. has a whopping 27 traffic circles which seems excessive until you realize that most of these are North of the zoo and not in the denser southern part of the street.

I thought quite a bit about how to best represent these values and ultimately settled on mapping streets colored to the number of traffic circles on that street. I used R and Color Brewer to make a color attribute for the map based on traffic circle count that could be read by Leaflet. I wrote out my GeoJSON files using rgdal::writeOGR() and then found that was way too slow to reasonably run in a browser so I converted it to topojson using topojson -o colored_traffic_circles2.topojson -p color colored_traffic_circles.geojson. Even this ran too slowly so I had to reduce to streets with more than two traffic circles.

Here the darker the color, the more traffic circles present on that street. For me the most interesting feature is how many traffic circles in a N-S direction are right next to State Highway 99 or near I-5. Is this an attempt to mitigate traffic of people trying to use surface streets instead of arterials? Possibly, though difficult to determine. Regardless, 1042 is an impressive amount of traffic circles for a metropolitan area this large.

Notes

  • The City dataset has dates on traffic circle installations with the oldest traffic circle being 18TH AVE E AND E HARRISON ST installed on 1/5/1976
  • Full size version of this map available here
  • All code available here

Subsetting Shapefiles With R

I have been trying to improve my GIS skills lately and have been trying to use R for as much of this process as I can. One of the tasks I frequently perform is taking a shapefile, subsetting it, and then converting to a GeoJSON. The npm module ogr2ogr is excellent for converting from a shapefile to GeoJSON, however I frequntly find myself needing to select only certain areas of a shapefile. I have been using two libraries in R to achieve this, specifically rgdal and sp.

For example, lets use the Congressional District 2012 shapefiles from the Washington State Office of Financial Management. Downloading the file, unzipping, and then loading into R with

We want to select only the districts that cover Seattle, 7 and 9 which is as simple as subsetting

seattle.only <- subset(wa.cd, CD113FP %in% c('07', '09'))

One of the nice features about GitHub gists is that you can overlay a GeoJSON file on a Google map for a quick QC check. While R accepts a variety of projection formats, Github does not and I occasionally I find I have to convert to the WGS84 datum which are easily done with

seattle.only.wgs <- spTransform(seattle.only, CRS("+proj=longlat +ellps=WGS84"))

And written out as a GeoJSON file with

writeOGR(seattle.only.wgs, dsn="seattle.only.wgs.geojson", layer="cd2012", driver="GeoJSON", check_exists = FALSE)

Occasionally I get an error about the file I am about to create not being found. This Stack Overflow answer was very helpful and now I add the check_exists = FALSE parameter every time I write out with writeOGR().

Fremont Bridge Opening Times

I bike across the Fremont Bridge twice a day which Wikipedia claims is the most frequently opened bridge in the United States. This claim is uncited and while it may be true, due to Federal Maritime Law boats get precedence for bridge opening with the exceptions of rush hours which in Seattle are M-F 7-9 AM and 4-6 PM. I often get to the bridge on my bike around 9 AM in the morning and 6 PM in the evening and it always felt like the bridge opens for a boat right at 9 and 6 PM on the dot. I wanted to verify this and figured the only way to do so would be to manually time the bridge openings but that seemed like too much effort.

Recently, a friend notified me about the twitter account of Seattle DOT bridges which is basically a bot that posts bridge openings and closings such as:

I used the excellent twitteR to scrape tweets from Seattle DOT bridges for the past month to test how accurate my assumption was. From this, I pulled the first crossing post morning rush hour and evening rush hour for weekdays only.

The mean opening time post-morning rush was 9:28 AM and the mean opening time post-evening rush was 9:25 which means that my assumptions were pretty off and I should not feel so stressed to arrive at the bridge before 9 AM and 6 PM.

Batch Collection of Park Boundaries With Open Street Map

Open Street Map (OSM) is, simply put, a freely available and editable map of the world. I have been interested in improving the availability of boundaries in Seattle and wanted to add park boundaries to this list as well. It was easy to look up boundaries on OSM, for example Salmon Bay Park shows the various nodes that make up its boundaries. But I had struggled with how to automate this search since at last count Seattle had over 400 parks. After months of struggling with the OSM API, I fortuitously stumbed across the following tweet:

This tweet lead me to Mapzen which provides a service called Metro Extracts which provides datasets from OSM on a weekly basis. I downloaded the OSM2PGSQL GeoJSON file for Seattle which provided me files for Line, Point, and Polygon geometries. I then used ogr2ogr to filter for parks only with the command

ogr2ogr select 'osm_id, name, geometry' where "leisure = 'park'"

This produced a GeoJSON file that looked like this:

Obviously, more filtering needed to be done. Since many of these parks were not in Seattle, I used the Nominatim API to search for each park based on the OSM ID number. For example, the above mentioned park Salmon Bay Park returns a nicely formatted XML file which I just filtered based on city.

Even after this there were still parks that were wrongly labelled as being in Seattle. I loaded the file into R and subset based on OSM ID and then used rgdal to write the final result out as a GeoJSON file.

The take home lesson for me is that OSM is an excellent service but as with any publically annotated dataset be prepared to invest some time into cleaning and validating the data.

Update on Restaurant Changes

I have been tracking restaurant openings via the City of Seattle Business Finder since the beginning of this year and am reporting those changes at Seattle Restaurant Changes. Recently I put up a heatmap showing changes by neighborhood. This heatmap shows a current snaphot of the changes which made me curious about changes by restaurant type over the course of the year.

A few notes:

  • The City of Seattle uses North American Industry Classification System codes (NAICS) to track restaurants. I then use the date of permit issuance as a proxy for a restaurant opening and date of permit revocation as a proxy for closing.

  • A Full Service Restaurant as defined by NAICS is “establishments primarily engaged in providing food services to patrons who order and are served while seated (i.e., waiter/waitress service) and pay after eating”

  • I realized that not that many breweries would be opening up but who doesn’t want more breweries in town?

  • I was not expecting Full Service Restaurants to take off as much as they did, especially since Limited Service Restaurants seem to be declining.

I will try to post another update on this in December, that is unless I decide to open up a food truck of my own.

The Felix Factor

I was listening to the Jonah Keri podcast and he and Ben Gibbard were talking about the Mariners, specifically Felix Hernandez. One of the points Gibbard made was that Hernandez is so outstanding that he will be remembered and that people should try to see him pitch in person. This made me wonder, did Felix Hernandez have an impact on home ticket sales for the Mariners in 2014?

I was able to get all the data from some of the nicely formatted box score data that MLB provides. I initially tried to look at the data over the course of the year but attendance was so variable (which made for an extremely confusing plot) that I just ended up making a box and whiskers plot and ignored the date element:

Conclusion: Hernandez was not that strong of a driver of ticket sales which is great news if you are hoping to see him pitching in person.

Seattle Restaurant Changes

Seattle construction is currently booming and I was interested in how that reflected in the local restaurant scene. There are many food blogs and local news sites that cover openings and closings, but I found it too difficult to parse these in a regular manner. Fortunately I was able to use data from the City of Seattle business finder and used the restaurant classification or NAICS code as a proxy. Using the data in this manner makes an assumption that a restaurant will no longer have a business licence after it closes. I’m not sure how accurate this is but I figured it was as accurate as I could get short of hiring people on Mechanical Turk to phone every restaurant every week and ask if the restaurant is still open. To map each restaurant to a particular neighborhood, I used geolocation to map license address returned by the City of Seattle business finder. Obviously that does not work as well for Mobile Food Services (i.e. food trucks) but it still allows for an interesting comparison. This data is plotted at Seattle Restaurant Changes.

I initially attempted to scrape data from The Stranger but after finding the City of Seattle site I just used BeautifulSoup for the scraping. I would not have been able to do much more beyond that state if it had not been for Nathan Yau’s excellent tutorial on making maps with category filters. I was able to get a state level shapefile for Washington state from Zillow and then reduce that to just Seattle neighborhoods using R’s sp package. Full code posted on github

First Year on Fitbit

After a year on Fitbit, I figured it might be time to take a look at the data that I have been generating. Unfortunately, Fitbit makes you sign up for Premium which charges you $50 per year to export your data. Fortunately, Cory Nissen has created an excellent R package for doing just this. The package simply uses a POST request handled by Hadley’s httr library to generate a cookie and then parses the returned JSP results to return a nice data.frame.

Anyways, onto the data.

The first command I tried was the get_15_min_data() for parsing step data in 15 minute increments. I figured that looking at yesterday’s data would be granular enough to get a good feel for the data.

I then plotted number of steps taken per day, with a smoothing function overlaid:

I had a mean step count of 13935 for the past year. This data is more interesting to look at as more of an overall trend. There definitely a seasonal trend in the summer which makes sense. I can also see the signatures of when I went on a four day backpacking trip in August and when I broke two ribs and was confined to the couch for four days in mid-March.

Since I have a Fitbit one, I can also measure floors climbed.

My mean number of floors climbed is 69.62 which seems absurdly high. My desk is on the fourth floor of my building and I usually take the stairs but not sure that is enough to fully explain why these counts are so high.

Still, it is pretty interesting to look at this data outside of the Fitbit interface and I would highly recommend checking out Cory’s github repo

Also, speaking of github, for those of you who regularly follow this blog (hi, Mom!) I have moved away from making a new gist every time to simply having a standalone repo

Offsetting Beer by Running

Last year, among other personal data, I tracked every bar I went to and every mile I ran. Naturally my first question is do I run enough to offset the amount of beer I am drinking (at bars)?

First we define some units. According to this Runner’s World calculator, at 8:45 minute/mile for my weight I am burning 145 calories. Google says the amount of calories in a pint of beer is about 180. Since I usually average about two beers each time I go to a bar, that simplifies the calculations. Over the course of the year, how often was I above or below the residual? To answer this, I used R and finally got around to trying tidyr which is pretty slick.

I thougth a lot about how to determine the residual but eventually settled on calories out - calories in because I felt this method made the best visualization. As you can see around week 30, I started to run more and did a better job at offsetting my beer consumption. Obviously this is an overly simplistic view of my caloric expenditure but shows some of the interesting insights that can be gained from personal data.

As always, all code and data is in this github gist

Summarizing Books Read Over Time

I recently read an interesting blog post where the author examined their books rated on Goodreads and summarizing interesting trends. I decided to do a similar analysis even though I use LibraryThing instead.

LibraryThing has a nice option to allow to to export your data in a variety of formats. Since I write R code to parse CSV files everyday I thought I would do something different and parse a JSON file with python.

I have been on LibraryThing since 2007 and the first question I was interested in was have my average ratings changed over time? I calculated the mean for each book by year:

Year Average Rating
2007 3.446809
2008 3.480000
2009 3.485294
2010 3.641509
2011 3.456522
2012 3.529412
2013 3.321429
2014 3.614583

While uninteresting, this makes a lot of sense - if I am reading a book that I do not enjoy, I will usually bail on it which tends to bias my ratings upward. Over time, there have been a few notable exceptions.

One of the other interesting analyses in the blog post was examining how the reviewer’s ratings have changed based on the month of the year. I wanted to make a similar plot using R’s ggplot2 however since I was writing this in python I was largely limited to matplotlib. Fortunately, many people have struggled with this issue and the fine folks at yhat have ported ggplot2 over to python. With this library I was able to use geom_smooth to produce the following plot showing rating trends by week.

I tried to figure out why my legend never showed up but I figured that since most of the trend lines were pretty much the same anyways that the plot was fine without a legend. It appears that I get in most of my good reviews early in the year and am harsher later in the year.

The last figure in the blog post compares the writer’s review scores to the Goodreads consensus score. I attempted to replicate this but ran into more trouble than it was worth to extract that data from LibraryThing so I abandoned that analysis.

If interested, I put my python code in a GitHub gist.