One of my favourite podcasts is Slate’s Lexicon Valley. All about language, it is rigorous and detailed in its approach to the subject, which appeals to the closet academic in me, but also extremely entertaining. It is a sign of a good podcast to find yourself bursting out laughing while walking down a busy city street. Lexicon Valley is to blame for numerous moments of alarm for my fellow commuters.
In September last year, hosts Mike Vuolo (the knowledgeable one) and Bob Garfield (the funny one) interviewed linguist Geoffrey Nunberg, talking to him about his recent book, Ascent of the A-Word: Assholism the First Sixty Years. A half hour discussion of the evolution of the word “asshole”helps earn this podcast an “Explicit” tag in the iTunes store and, as a result, this will be the first Stubborn Mule post that may fall victim to email filters. Apologies in advance to anyone of a sensitive disposition and to any email subscribers this post fails to reach.
Nunberg traces the evolution of “asshole” from its origins among US soliders in the Second World War through to its current role as a near-universal term of abuse for arrogant boors lacking self-awareness. Along the way, he explores the differences between profanity (swearing drawing on religion), obscenity (swearing drawing on body parts and sexual activity) and plain old vulgarity (any of the above).
The historical perspective of the book is supported by charts using Google “n-grams”. An n-gram is any word or phrase found in a book and one type of quantitative analysis used by linguists is to track the frequency of n-grams in a “corpus” of books. After working for years with libraries around the world, Google has amassed a particularly large corpus: Google Books. Conveniently for researchers like Nunberg,with the help of the Google n-gram Viewer, anyone can analyse n-gram frequencies across the Google Books corpus. For example, the chart below shows that “asshole” is far more prevalent in books published in the US than in the UK. No surprises there.
If “asshole” is the American term, the Australian and British equivalent should be “arsehole”, but surprisingly arsehole is less common than asshole in the British Google Books corpus. This suggests that, while being a literal equivalent to asshole, arsehole really does not perform the same function. If anything, it would appear that the US usage of asshole bleeds over to Australia and the UK.
Intriguing though these n-gram charts are, they should be interpreted with caution, as I learned when I first tried to replicate some of Nunberg’s charts.
The chart below is taken from Ascent of the A-word and compares growth in the use of the words “asshole” and “empathetic”. The frequencies are scaled relative to the frequency of “asshole” in 1972* . At first, try as I might, I could not reproduce Nunberg’s results. Convinced that I must have misunderstood the book’s explanation of the scaling, I wrote to Nunberg. His more detailed explanation confirmed my original interpretation, but meant that I still could not reproduce the chart.
Relative growth of “empathetic” and “asshole”
Then I had an epiphany. It turns out that Google has published two sets of n-gram data. The first release of the data was based on an analysis of the Google Books collection in July 2009, described in the paper Michel, Jean-Baptiste, et al. “Quantitative analysis of culture using millions of digitized books” Science 331, No. 6014 (2011): 176-182. As time passed, Google continued to build the Google Books collection and in July 2012 a second n-gram data set was assembled. As the charts below show, the growth of “asshole” and “empathetic” is somewhat different depending on which edition of the n-gram data set used. I had been using the more recent 2012 data set and, evidently, Nunberg used the 2009 data set. While either chart would support the same broad conclusions, the differences show that smaller movements in these charts are likely to be meaningless and not too much should be read into anything other then large-scale trends.
So far I have not done very much to challenge anyone’s email filters. I can now fix that by moving on to a more recent Lexicon Valley episode, A Brief History of Swearing. This episode featured an interview with Melissa Mohr, the author of Holy Shit: A Brief History of Swearing. In this book Mohr goes all the way back to Roman times in her study of bad language. Well-preserved graffiti in Pompeii is one of the best sources of evidence we have of how to swear in Latin. Some Latin swear words were very much like our own, others were very different.
Of the “big six” swear words in English, namely ass, cock, cunt, fuck, prick and piss (clearly not all as bad as each other!), five had equivalents in Latin. The only one missing was “piss”. It was common practice to urinate in jars left in the street by fullers who used diluted urine to wash clothing. As a result, urination was not particularly taboo and so not worthy of being the basis for vulgarity. Mohr goes on to enumerate another five Latin swear words to arrive at a list of the Roman “big ten” obscenities. One of these was the Latin word for “clitoris”, which was a far more offensive word than “clit” is today. I also learned that our relatively polite, clinical terms “penis”, “vulva” and “vagina” all derive from obscene Latin words. It was the use of these words by the upper class during the Renaissance, speaking in Latin to avoid corrupting the young, that caused these words to become gentrified.
Unlike Nunberg, Mohr does not make use of n-grams in her book, which provides a perfect opportunity for me to track the frequency of the big six English swear words.
The problem with this chart is that the high frequency of “ass” and “cock”, particularly in centuries gone by, is likely augmented by their use to refer to animals. Taking a closer look at the remaining four shows just how popular the use of “fuck” became in the second half of the twentieth century, although “cunt” and “piss” have seen modest (or should I say immodest) growth. Does this mean we are all getting a little more accepting of bad language? Maybe I need to finish reading Holy Shit to find out.
* The label on the chart indicates that the reference year is 1972, but by my calculations the reference year is in fact 1971.