Sentiment Analysis of Candidate Statements by Senate Candidates

The 2018 Washington State Primary was held on August 7, 2018. As a registered voter in Washington State I am mailed a Voter’s Information Pamphlet which lists the candidates and a Candidate Statement (provided by the candidate) for each office. The Candidate Statement is where the candidate is allowed to write anything they want as long as its under 300 words. I was curious if there was any relationship between the sentiment of a candidate’s Statement and the number of votes that candidate received.

I conducted my analysis using sentiment analysis which groups words together based on pre-defined lists of words that are members of that group. There are many word groups to choose from such as “joy” and “trust” but for this analysis I just looked at “positive” and “negative” words (as classified by NRC).

Although there were many different offices up for election in this Primary, the U.S. Senate race had 29 candidates which made for a very rich dataset. Washington State uses a Top-Two Primary which allows for easy comparison across political parties.

There are many factors deliberately ignored by this analysis such as PVI, incumbency, fundraising, political party and candidate issues. However I thought text mining could be an interesting way to analyze these candidates in a slightly different manner.

First I just plotted the number of positive words in the Candidate Statement for each candidate:

Then I repeated with the number of negative words in the Candidate Statement for each candidate:

Next I looked at the number of positive words in the Candidate Statement by candidate versus the number of votes that candidate received. Because of the strength of the Democratic incumbent candidate Maria Cantwell and the Republican establishment candidate Susan Hutchison I had to log-transform the vote counts because these two candidates got so much of the total vote.

Then I repeated this analysis by looking at the negative word count for each candidate versus the total log-transformed vote count:

While I don’t think this will lead to any novel political insights, I do think its an interesting way to look at candidates. Full code including individual candidate statements available here