Simple Webstats With R

As someone who puts out writings out publically, I am naturally curious who (if anyone) is actually reading what I write. To answer this I developed a simple webstat calculator using R. I realize there are many options out there for tracking visits but to paraphrase my friend Andy, when has using standard libraries lead to anything cool?.

My main interests in this project is to answer two questions:

  1. Are people visting this site?

  2. Where are they visting from?

I don’t really care about things like bounce rate or type of device used to access the site. Not having to worry about either of these issues helps cut down on the complexity. I run this site on an Apache server and use a standard log output to write my logfile: LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"

I made a small R script that uses knitr to output plots to HTML for ease in viewing. I wrote a shell script that uses the excellent little r to perform the commands. I run the shell script daily as a cron job and only look back at the past week’s worth of data. Since this blog is served on github pages, it can be difficult to see page views so I use images loaded as a proxy.

Here are some example plots of recent visitors:

And then another plot of visitor locations:

That outlier from Brazil is likely a Google bot crawling the site; better detection and removal of bot traffic from the final output is on the TODO list. All of the code (minus the shell script) lives on github