Tag Archives: language

How common are common words?

One of my favourite podcasts is Slate’s Lexicon Valley. All about language, it is rigorous and detailed in its approach to the subject, which appeals to the closet academic in me, but also extremely entertaining. It is a sign of a good podcast to find yourself bursting out laughing while walking down a busy city street. Lexicon Valley is to blame for numerous moments of alarm for my fellow commuters.

In September last year, hosts Mike Vuolo (the knowledgeable one) and Bob Garfield (the funny one) interviewed linguist Geoffrey Nunberg, talking to him about his recent book, Ascent of the A-Word: Assholism the First Sixty Years. A half hour discussion of the evolution of the word “asshole”helps earn this podcast an “Explicit” tag in the iTunes store and, as a result, this will be the first Stubborn Mule post that may fall victim to email filters. Apologies in advance to anyone of a sensitive disposition and to any email subscribers this post fails to reach.

Nunberg traces the evolution of “asshole” from its origins among US soliders in the Second World War through to its current role as a near-universal term of abuse for arrogant boors lacking self-awareness. Along the way, he explores the differences between profanity (swearing drawing on religion), obscenity (swearing drawing on body parts and sexual activity) and plain old vulgarity (any of the above).

The historical perspective of the book is supported by charts using Google “n-grams”. An n-gram is any word or phrase found in a book and one type of quantitative analysis used by linguists is to track the frequency of n-grams in a “corpus” of books. After working for years with libraries around the world, Google has amassed a particularly large corpus: Google Books. Conveniently for researchers like Nunberg,with the help of the Google n-gram Viewer, anyone can analyse n-gram frequencies across the Google Books corpus. For example, the chart below shows that “asshole” is far more prevalent in books published in the US than in the UK. No surprises there.

"Asshole" frequency US vs UKUse of “asshole” in US and UK Books

If “asshole” is the American term, the Australian and British equivalent should be “arsehole”, but surprisingly arsehole is less common than asshole in the British Google Books corpus. This suggests that, while being a literal equivalent to asshole, arsehole really does not perform the same function. If anything, it would appear that the US usage of asshole bleeds over to Australia and the UK.

Asshole/Arsehole frequencies“asshole” versus “arsehole”

Intriguing though these n-gram charts are, they should be interpreted with caution, as I learned when I first tried to replicate some of Nunberg’s charts.

The chart below is taken from Ascent of the A-word and compares growth in the use of the words “asshole” and “empathetic”. The frequencies are scaled relative to the frequency of “asshole” in 1972* . At first, try as I might, I could not reproduce Nunberg’s results. Convinced that I must have misunderstood the book’s explanation of the scaling, I wrote to Nunberg. His more detailed explanation confirmed my original interpretation, but meant that I still could not reproduce the chart.

Nunberg's chart: asshole versus empathy

Relative growth of “empathetic” and “asshole”

Then I had an epiphany. It turns out that Google has published two sets of n-gram data. The first release of the data was based on an analysis of the Google Books collection in July 2009, described in the paper Michel, Jean-Baptiste, et al. “Quantitative analysis of culture using millions of digitized books” Science 331, No. 6014 (2011): 176-182. As time passed, Google continued to build the Google Books collection and in July 2012 a second n-gram data set was assembled. As the charts below show, the growth of “asshole” and “empathetic” is somewhat different depending on which edition of the n-gram data set used. I had been using the more recent 2012 data set and, evidently, Nunberg used the 2009 data set. While either chart would support the same broad conclusions, the differences show that smaller movements in these charts are likely to be meaningless and not too much should be read into anything other then large-scale trends.

Empathy frequency: 2009 versus 2012Comparison of the 2009 and 2012 Google Books corpuses

So far I have not done very much to challenge anyone’s email filters. I can now fix that by moving on to a more recent Lexicon Valley episode, A Brief History of Swearing. This episode featured an interview with Melissa Mohr, the author of Holy Shit: A Brief History of Swearing. In this book Mohr goes all the way back to Roman times in her study of bad language. Well-preserved graffiti in Pompeii is one of the best sources of evidence we have of how to swear in Latin. Some Latin swear words were very much like our own, others were very different.

Of the “big six” swear words in English, namely ass, cock, cunt, fuck, prick and piss (clearly not all as bad as each other!), five had equivalents in Latin. The only one missing was “piss”. It was common practice to urinate in jars left in the street by fullers who used diluted urine to wash clothing. As a result, urination was not particularly taboo and so not worthy of being the basis for vulgarity. Mohr goes on to enumerate another five Latin swear words to arrive at a list of the Roman “big ten” obscenities. One of these was the Latin word for “clitoris”, which was a far more offensive word than “clit” is today. I also learned that our relatively polite, clinical terms “penis”, “vulva” and “vagina” all derive from obscene Latin words. It was the use of these words by the upper class during the Renaissance, speaking in Latin to avoid corrupting the young, that caused these words to become gentrified.

Unlike Nunberg, Mohr does not make use of n-grams in her book, which provides a perfect opportunity for me to track the frequency of the big six English swear words.

Big 6 SwearwordsFrequency of the “Big Six” swear words

The problem with this chart is that the high frequency of “ass” and “cock”, particularly in centuries gone by, is likely augmented by their use to refer to animals. Taking a closer look at the remaining four shows just how popular the use of “fuck” became in the second half of the twentieth century, although “cunt” and “piss” have seen modest (or should I say immodest) growth. Does this mean we are all getting a little more accepting of bad language? Maybe I need to finish reading Holy Shit to find out.

Big 4 Swear WordsFrequency of four of the “Big Six” swear words

* The label on the chart indicates that the reference year is 1972, but by my calculations the reference year is in fact 1971.

Language is a virus

Language is a virus and we are its host. Some strains of language are virulent and spread rapidly. Others are weaker, struggling to infect their hosts and easily supplanted by stronger challengers.

The natural habitat of the language virus is the social group. Some of the more obvious forms are schoolyard slang (what was unreal in my day was sick in later years, but could now be random) or the jargon of specialists. Sometimes the ponds the virus infects can be large ones. By 2008, everyone in Australia knew that “GFC” stood for “Global Financial Crisis”, but I repeatedly saw visitors from the US or UK mystified by this initialisation.

The corporate world is a rich source of (often meaningless) jargon, as decried by Paul Keating’s speech writer Don Watson in Death Sentence: The Decay of Public Language. But what has fascinated me of late in the corporate world is not the language of mission statements, paradigms, closure or value-add, but simpler more innocuous words or phrases that flourish within organisations. After a number of years away, I have been back less than two months at a firm I worked for before and I was immediately struck by the near universal use of a few expressions that I am sure were not being used there four years earlier, and were certainly not used at the company where I worked during the intervening years.

I have now realised that it is impossible to attend an internal meeting without someone suggesting an alternative lens with which to view a problem rather than, say, an alternative perspective. Even more prevalent is “calling out”, as in “I’ll just call out one or two points on this slide” or “Last time we met I called that out as the primary challenge”.

The point is not to criticise these terms themselves, which are quite reasonable means of expression, unlike so much of the corporate-speak that Don Watson ridicules. You could even make the case that “lens” is a better term as it suggests a point of view which can be quickly and simply changed, whereas “perspective” often has connotations of being more permanent. What fascinates me is the way these words have established such a firm hold on the organisation. It makes the social dimension of shared language very clear: if I start using the same terms as you, it makes me seem more a part of the group, which in turn reinforces your use of the terms. All of this can happen subconsciously, so that the hosts can be quite unaware of the infection. Some may notice, but to a newcomer like myself, the infestation is startlingly clear.

It probably will not be long until I find myself calling out the merits of putting on a different lens, but for now I am trying to be strong.

A way with words

Sometimes the things that are unsaid are far more telling than the things said.

I had cause to reflect on this when I stumbled across a book on my shelves that I have not opened for many years. The book, entitled “Deutsche Bank: Dates, facts and figures 1870-1993”, is an English translation of the year-by-year history of the bank compiled by Manfred Pohl and Angelike Raab-Rebentisch. In keeping with the title, the style is more bullet points than narrative. Nevertheless, I continue to find the pages spanning World War II strangely fascinating.

In 1938, with the connivance of the French and British, Germany annexed Sudetenland in Western Czechoslovakia. For Deutsche Bank, this meant more branches.

Deutsche Bank 1938

The following year, Deutsche Bank was fortunate enough to be able to continue its branch expansion, this time into Poland. At least this time, there is a mention of the events outside the bank that may have been relevant.

Deutsche Bank 1939

Another year, and some more expansion for the bank including a few branches in France. No need to mention the invasion of France here, of course.

Deutsche Bank 1940

From 1942, outside events start to interfere with the bank: the “impact of war” forces rather inconvenient branch closures.

DB War End

To see these extracts in the full context, here are the pages spanning 1934 to 1940 and 1940 to 1946.

The Art of Conversation

Have you ever heard the question “Would you like a tea or a coffee” answered with a simple “Yes”? If so, the respondent almost certainly considers their response to be extremely witty. The questioner is unlikely to agree. There is also a high probability that the joker is someone’s Dad…or perhaps a mathematician.

I have to admit to having indulged in this “joke” in my time (more than once), but until recently it had not occurred to me that it in fact reflects a violation of a general principle of conversation. Enlightenment came when I read the seminal 1975 paper “Logic and Conversation” [1] by the philosopher H.P.Grice.

The humour (or lack thereof) of the coffee/tea gag lies in the conflict between the logical truth of the statement and its inappropriateness in conversation. While the statement “A or B” is logically true as long as at least one of A and B is true , in the context of conversation, logical truth is not enough. If you knew A was true and B was false, you would not bother saying “A or B”, you would just say “A”. Moreover, that is what others would expect of you. If I ask you to pass me a hammer, I don’t expect you to pass me a hammer and a spanner. In the same way, if you know you are going to Spain for your holidays, I don’t expect you to say “I’m either going to Spain or Canada”, despite the fact that, strictly speaking, it is a true statement. It is this distinction between simple logical truth and appropriateness in conversation that is the subject of Grice’s paper.

Grice bases his ideas on the notion of the “Cooperative Principle”, which he summarises as the requirement to

Make your conversation such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.

People have conversations of many types for many reasons: to do business, to gossip, to seduce, to educate, to inform or simply for the pleasure of conversation itself. In every case, conversation involves (at least) two participants and the conversations that work best are the ones that take the needs of all of the participants into account. So it makes sense that a bit of cooperation is the foundation of a good conversation.

Based on the cooperative principle, Grice goes on to postulate a number of “maxims of conversation”. Here are the maxims as he describes them:


  1. Make your contribution as informative as is required (for the current purposes of the exchange).
  2. Do not make your contribution more informative than is required.


  1. Do not say what you believe to be false.
  2. Do not say that for which you lack adequate evidence.


  1. Be relevant.


  1. Avoid obscurity of expression.
  2. Avoid ambiguity.
  3. Be brief (avoid unnecessary prolixity).
  4. Be orderly.

The term “maxim” is carefully chosen as Grice notes that one need not follow all of the maxims at all times, while still being cooperative. The main reason that a maxim could be violated is if it is in conflict with another maxim. An example would be providing less information than required (violating Quantity 1) because you are not confident you have the facts right (and you don’t want to violate Quality 2).

Viewed in terms of Grice’s maxims, the coffee/tea joke is a clear violation of the first maxim of quantity.

As I have already admitted to this particular breach, the obvious question is: have I violated any other maxims? Some who know me well would take the view that, while I may take pains to avoid a violation of either of the maxims of quality, I regularly and flagrantly violate Quantity 2 and Manner 3 and probably Relation 1. I need to learn to stick to the point or risk being branded an uncooperative conversationalist! Or perhaps it’s too late.

[1] Available in the collection “Studies in the Way of Words” by H.P.Grice.