Summarizing Books Read Over Time
I recently read an interesting blog post where the author examined their books rated on Goodreads and summarizing interesting trends. I decided to do a similar analysis even though I use LibraryThing instead.
LibraryThing has a nice option to allow to to export your data in a variety of formats. Since I write R code to parse CSV files everyday I thought I would do something different and parse a JSON file with python.
I have been on LibraryThing since 2007 and the first question I was interested in was have my average ratings changed over time? I calculated the mean for each book by year:
While uninteresting, this makes a lot of sense - if I am reading a book that I do not enjoy, I will usually bail on it which tends to bias my ratings upward. Over time, there have been a few notable exceptions.
One of the other interesting analyses in the blog post was examining how the
reviewer’s ratings have changed based on the month of the year. I wanted to
make a similar plot using R’s
ggplot2 however since I
was writing this in python I was largely limited to matplotlib.
Fortunately, many people have struggled with this issue and the fine
folks at yhat have ported ggplot2 over to
python. With this library I was
able to use
geom_smooth to produce the following plot showing rating
trends by week.
I tried to figure out why my legend never showed up but I figured that since most of the trend lines were pretty much the same anyways that the plot was fine without a legend. It appears that I get in most of my good reviews early in the year and am harsher later in the year.
The last figure in the blog post compares the writer’s review scores to the Goodreads consensus score. I attempted to replicate this but ran into more trouble than it was worth to extract that data from LibraryThing so I abandoned that analysis.
If interested, I put my python code in a GitHub gist.