Monthly Archives: April 2012

Goodhart’s Law

Another post and another Law, but this time no mathematics is involved.

Imagine you are running a team of salespeople and, as a highly motivated manager, you are working on strategies to improve the performance of your team. After a close study of your team’s client call reports you realise that the high performers in the team consistently meet with their clients more frequently than the poor performers. Eureka! You now have a plan: you set targets for the number of times your team should meet with clients each week. Bonuses will depend upon performance against these targets. Confident that your new client call metric is highly correlated with sales performance, is objective and easily measurable, you sit back and wait.

Six months later, it is time to review the results. Initially you are pleased to discover that a number of your poor performers have achieved very good scores relative to your new targets. Most of the high performers have done well also, although you are a little disappointed that your best salesperson came nowhere near the “stretch target” you set. You then begin to review the sales results and find them very puzzling: despite the high number of client meetings, the results for most of your poor performers are worse than ever. Not only that, your top salesperson has had a record quarter. After you have worked out whether you can wriggle out of the commitment you made to link bonuses to your new metric, you would do well to reflect on the fact that you have fallen victim to Goodhart’s Law.

According to Goodhart’s Law, the very act of targeting a proxy (client meetings) to drive a desired outcome (sales performance) undermines the relationship between the proxy and the target. In the client meeting example, the relationship clearly broke down because your team immediately realised it was straightforward to “game” the metric, recording many meetings without actually doing a better job of selling. Your highest performer was probably too busy doing a good job to waste their clients’ time with unnecessary meetings.

The Law was first described in 1975 by Charles Goodhart in a paper delivered to the Reserve Bank of Australia. It had been observed that there was a close relationship between money supply and interest rates and, on this basis, the Bank of England began to target money supply levels by setting short-term interest rates. Almost immediately, the relationship between interest rates and money supply broke down. While the reason for the breakdown was loosening of controls on bank lending rather than salespeople gaming targets, the label “Goodhart’s Law” caught on.

Along with its close relatives Campbell’s Law and the Lucas Critique, Goodhart’s Law has been used to explain a broad range of phenomena, far removed from its origins in monetary policy. In 18th century Britain, a crude form of poll tax was levied based on the number of windows on every house. The idea was that the number of windows would be correlated with the number of people living in the house. It did not take long for householders to begin bricking up their windows. A more apocryphal example is the tale of the Soviet-era nail factory. Once central planners set targets for the weight of nail output, artful factory managers met their target by making just one nail, but an enormous and very heavy nail.

Much like the Law of Unintended Consequences, of which it is a special case, Goodhart’s Law is one of those phenomena that, once you learn about it, you cannot help seeing it at work everywhere.

Benford’s Law

Here is a quick quiz. If you visit the Wikipedia page List of countries by GDP, you will find three lists ranking the countries of the world in terms of their Gross Domestic Product (GDP), each list corresponding to a different source of the data. If you pick the list according to the CIA (let’s face it, the CIA just sounds more exciting than the IMF or the World Bank), you should have a list of figures (denominated in US dollars) for 216 countries. Ignore the fact that the European Union is in the list along with the individual counties, and think about the first digit of each of the GDP values. What proportion of the data points start with 1? How about 2? Or 3 through to 9?

If you think they would all be about the same, you have not come across Benford’s Law. In fact, far more of the national GDP figures start with 1 than any other digit and fewer start with 9 than any other digit. The columns in the chart below shows the distribution of the leading digits (I will explain the dots and bars in a moment).

Distribution of leading digits of GDP for 216 countries (in US$)

This phenomenon is not unique to GDP. Indeed a 1937 paper described a similar pattern of leading digit frequencies across a baffling array of measurements, including areas of rivers, street addresses of “American men of Science” and numbers appearing in front-page newspaper stories. The paper was titled “The Law of Anomalous Numbers” and was written by Frank Benford, who thereby gave his name to the phenomenon.

Benford’s Law of Anomalous Numbers states that that for many datasets, the proportion of data points with leading digit n will be approximated by

log10(n+1) – log10(n).

So, around 30.1% of the data should start with a 1, while only around 4.6% should start with a 9. The horizontal lines in the chart above show these theoretical proportions. It would appear that the GDP data features more leading 2s and fewer leading 3s than Benford’s Law would predict, but it is a relatively small sample of data, so some variation from the theoretical distribution should be expected.

As a variation of the usual tests of Benford’s Law, I thought I would choose a rather modern data set to test it on: Twitter follower numbers. Fortunately, there is an R package perfectly suited to this task: twitteR. With twitteR installed, I looked at all of the twitter users who follow @stubbornmule and recorded how many users follow each of them. With only a relatively small follower base, this gave me a set of 342 data points which follows Benford’s Law remarkably well.

;

Distribution of leading digits of follower counts

As a measure of how well the data follows Benford’s Law, I have adopted the approach described by Rachel Fewster in her excellent paper A Simple Explanation of Benford’s Law. For the statistically-minded, this involves defining a chi-squared statistic which measures “badness” of Benford fit. This statistic provides a “p value” which you can think of as the probability that Benford’s Law could produce a distribution that looks like your data set. The follower-count for @stubbornmule is a very high 0.97, which shows a very good fit to the law. By way of contrast, if those 342 data points had a uniform distribution of leading digits, the p value would be less than 10-15, which would be a convincing violation of Benford’s Law.

Since so many data sets do follow Benford’s Law, this kind of statistical analysis has been used to detect fraud. If you were a budding Enron-style accountant set on falsifying your company’s accounts, you may not be aware of Benford’s Law. As a result, you may end up inventing too many figures starting with 9 and not enough starting with 1. Exactly this style of analysis is described in the 2004 paper The Effective Use of Benford’s Law to Assist in Detecting Fraud in Accounting Data by Durtshi, Hillison and Pacini.

By this point, you are probably asking one question: why does it work? It is an excellent question, and a surprisingly difficult and somewhat controversial one. At current count, an online bibliography of papers on Benford Law lists 657 papers on the subject. For me, the best explanation is Fewster’s “simple explanation” which is based her “Law of the Stripey Hat”. However simple it may be, it warrants a blog post of its own, so I will be keeping you in suspense a little longer. In the process, I will also explain some circumstances in which you should not expect Benford’s Law to hold (as an example, think about phone numbers in a telephone book).

In the meantime, having gone to the trouble of adapting Fewster’s R Code to produce charts testing how closely twitter follower counts fit Benford’s Law, I feel I should share a few more examples. My personal twitter account, @seancarmody, has more followers than @stubbornmule and the pattern of leading digits in my followers’ follower counts also provides a good illustration of Benford’s Law.

One of my twitter friends, @stilgherrian, has even more followers than I do and so provides an even larger data set.

Even though the bars seem to follow the Benford pattern quite well here, the p value is a rather low 5.5%. This reflects the fact that the larger the sample, the closer the fit should be to the theoretical frequencies if the data set really follows Benford’s Law. This result appears to be largely due to more leading 1s than expected and fewer leading 2s. To get a better idea of what is happening to the follower counts of stilgherrian’s followers, below is a density* histogram of the follower counts on a log10 scale.

There are a few things we can glean from this chart. First, the spike at zero represents accounts with only a single follower, accounting around 1% of stilgherrian’s followers (since we are working on a log scale, the followers with no followers of their own do not appear on the chart at all). Most of the data is in the range 2 (accounts with 100 followers) to 3 (accounts with 1000 followers). Between 3 and 4 (10,000 followers), the distribution falls of rapidly. This suggests that the deviation from Benford’s Law is due to a fair number users with a follower count in the 1000-1999 range (I am one of those myself), but a shortage in the 2000-2999 range. Beyond that, the number of data points becomes too small to have much of an effect.

Histogram of follower counts of @stilgherrian’s followers

Of course, the point of this analysis is not to suggest that there is anything particularly meaningful about the follower counts of twitter users, but to highlight the fact that even the most peculiar of data sets found “in nature” is likely to yield to the power of Benford’s Law.

* A density histogram scales the vertical axis to ensure that the histogram covers a region of area one rather than the frequency of occurrences in each bin.