Category Archives: mathematics

Careful with that thing – you could kill someone

It’s been a while, but guest author Mark Lauer is returning to the Mule. While in COVID-induced lockdown, the mind naturally turns to armchair epidemiology, but here Mark goes beyond mere amateur probability to add a sprinkling of ethics.

So, you’re in lockdown during a COVID-19 outbreak in your city. And you’re wondering, now that most of the elderly are vaccinated, if all the fuss is really justified. After all, only a tiny proportion of the city has caught COVID so far, and even if you get it, statistically speaking it is unlikely to harm you. The number of people dying is a small fraction of the population, especially now that effective vaccines are being rolled out. So just how dangerous would it be if you popped down to see that friend you’ve been missing?

It turns out, if you happen to have COVID, it could be rather dangerous indeed. It may not be too risky for you, or your friend, but let’s do some simple mathematics to see what the consequences might be for others if you do pass on the virus.

In what follows, I’ll focus on the current outbreak here in Sydney, which began on June 16. It’s unusual at this stage of the global pandemic, since the population has lived largely unrestricted for over a year and perhaps some have become complacent about dealing with the virus, despite the carnage and sacrifices of freedom seen overseas. But the general gist applies anywhere that has significant case counts which aren’t falling dramatically.

Please note though, I am not an epidemiologist. There are many more qualified people, building far more sophisticated models. Listen to them and follow their advice.

One obvious factor to consider is how likely it is that you’re infected. This will vary depending on the number of cases in the outbreak, how many cases are near you, and how often you go shopping or meet others. But remember it takes several days for testing to reveal where cases are, during which time the outbreak can spread far across the city. Also many people with COVID are asymptomatic, or at least asymptomatic for a period while they are infectious. None of the people who’ve passed on the virus so far have thought they had it at the time. And it seems the Delta strain may take as little as a few seconds of contact to transmit. But let’s set that question aside, and look at what happens if you do transmit it.

So suppose you unknowingly have the virus, and choose between two courses of action, one that passes it on to another person, and the other that avoids doing so. From an ethical stand point, just how bad is it if you opt for the former?

To start, let’s consider the average risk of death for the person you infect. Case fatality rates for COVID-19 are in the range of 1-3% in most countries, but of course these will vary depending on many factors: the standard and capacity of health facilities, who in the population is getting the disease, how many of those are vaccinated, and the virulence of the prevalent strain.

In the Sydney outbreak we’ve had relatively few deaths. As at July 26, there have been 10 in this outbreak, whereas total case counts are now above 2000. However, that neglects the delay between cases being identified and consequent deaths. A study in the Journal of Public Health published in March finds the average lag is 8 days (even longer if a lower proportion of those infected are over 60 years old).

So a more comparable estimate of cases might be the number of locally acquired cases reported up to July 18, which is 1364. That yields a case fatality rate in this outbreak of 0.73%, which is indeed low by global standards of COVID. But while it might seem like a small number, that’s 7300 micromorts, which is equivalent to spending over 7 months as a British soldier serving in Afghanistan.

Now perhaps you and your friend are vaccinated, in which case the mortality risk to you is substantially lower. But while vaccination helps prevent your death, it is far less effective against transmitting the virus. And ethically speaking we need to consider what happens if your friend then passes the virus on further. The probability of this will vary according to the situation. If your friend is actually someone you’re keeping locked in your cellar as a slave, then there’s no way for them to pass it on, and you can feel relieved of any moral qualms about deaths due to passing on the virus further (we can set aside other moral considerations in this scenario, since we’re talking about manslaughter here, so why worry about a minor case of enslavement).

Since normally we have little control over how others behave, even friends, let’s assume the friend is exactly like the average other Sydneysider in this outbreak. We can roughly guess the effective reproduction rate of the virus in the conditions of this outbreak by looking at case counts over time. Here is a chart of the number of new locally acquired cases by date during the outbreak so far.

Bar chart of new locally acquired cases in NSW 16.6-26.7.2021

Source: NSW Government

In the 24 days through to the imposition of city-wide stay-at-home restrictions on July 10, new cases grew exponentially to reach 103. For the purpose of this argument, I’ll assume a fixed cycle of infection lasting 3 days (this is not essential, since values below are still valid albeit with slightly different timeframes if the cycle is longer or shorter). A quick calculation yields a reproduction rate, r = 1.8. That is, each infected person infects an average of 1.8 other people every three days.

At this level of transmission, 100 people will infect 180 people in three days, who will infect another 324 people after six days, and so on. If this continues for 15 days, the total number of resulting infections will be 4126, or 41.26 people per original infected person. If each of the 41 people infected via our friend has a 0.73% chance of dying from the virus, there is over 25% chance that at least one person will die. And that’s only counting infections in the next 15 days. Giving the virus to one person is significantly worse than Russian roulette under these conditions.

Of course, as Sydneysiders are uncomfortably aware, the government here has been instituting successively more stringent restrictions across the city. And in the two weeks or so since July 11, the growth in case counts has happily slowed somewhat. Unfortunately, lockdown efforts so far appear to be insufficient to bring case counts down dramatically, with over 170 new cases reported on July 25.

But let’s be wildly optimistic and say that the reproduction rate is now down to 0.9. In that case, 100 people infect 90 people who infect 81 people, so that after 15 days the expected total number of resulting cases is 469. Your single transmission to your friend then leads to around 3.4% chance of at least one death as a result of infection in the next 15 days.

While that’s much better than before the citywide restrictions, it is nothing to shrug off. It’s similar to the chance of dying:

when attacked by a shark (5%),
being hit by a car as a pedestrian (under 4%),
undergoing heart bypass surgery (2.2%)
having a year of major depression (2%), or
after being injured with a chain saw (under 1%).

Most would agree that all these events have a “reasonable chance of killing someone”. And so too does passing on the virus under the current Sydney outbreak conditions.

So please, please be careful. Your choices can save lives.

Bob

6 Replies

Last year I wrote on a couple of occasions about the Sleeping Beauty problem. The problem raises some tricky questions and I did promise to attempt to answer the questions, which I am yet to do. Only last week, I was discussing the problem again with my friend Giulio, whose paper on the subject I published here. That discussion prompted me to go back to the inspiration for the original post: a series of posts on the Bob Walter’s blog. I re-read all of his posts, including his fourth post on the topic, which began:

I have been waiting to see some conclusions coming from discussions of the problem at the Stubborn Mule blog, however the discussion seems to have petered out without a final statement.

Sadly, even if I do get to my conclusions, I will not be able to get Bob’s reaction, because last week he died and the world has lost a great, inspirational mathematician.

Bob was my supervisor in the Honours year of my degree in mathematics and he also supervised Giulio for his PhD. Exchanging emails with Giulio this week, we both have vivid memories of an early experience of Bob’s inspirational approach to mathematics. This story may not resonate for everyone, but I can assure you that there are not many lectures from over 25 years ago that I can still recall.

The scene was a 3rd year lecture on Categories in Computer Science. Bob started talking about stacks, a very basic data structure used in computing. You should think of a stack of plates: you can put a plate on the top of the stack, or you can take one off. Importantly, you only push on or pop off plates from the top of the stack (unless you want your stack to crash to the floor). And how should a mathematician think about a stack? As Bob explained it, from the perspective of a category theorist, the key to understanding stacks is to think about pushing and popping as inverse functions. Bear with me, and I will take you through his explanation.

Rather than being a stack of plates, we will work with a stack of a particular type of data and I will denote by X the set of possible data elements (X could denote integers, strings, Booleans, or whatever data type you like). Stacks of type X will then be denoted by S. Our two key operations are push and pop.

The push operation takes an element of X and a stack and returns a new stack, which is just the old stack with the element of X added on the top. So, it’s a function push: X × S → S. Conversely, pop is a function S → X × S which takes a stack and returns the top element and a stack, which is everything that’s left after you pop the top.

So far, so good, but there are some edge cases to worry about. We should be able to deal with an empty stack, but what if we try to pop an element from the empty stack? That doesn’t work, but we can deal with this by returning an error state. This means that we should really think of pop as a function pop: S → X × S + I, where I is a single element set, say {ERR}. Here the + is a (disjoint) union of sets, which means that the pop function will either return a pair (an element of X and a stack) or an error state. This might be a bit confusing, so to make it concrete, imagine I have a stack s = (x₁, x₂, x₃) then

pop((x₁, x₂, x₃)) = (x₁, (x₂, x₃))

and this ordered pair of data element x₁and (shorter) stack (x₂, x₃) is an element of X × S. Now if I want to pop an empty stack (), I have

pop(()) = ERR

which is in I. So pop will always either return an element of X × S or an element of I (in fact, the only element there is).

This should prompt us to revisit push as well, which should really be considered as a function push: X × S + I → S which, given an element of X and a stack will combine them, but given the single element of I will return an empty stack, so push(ERR) = ().

The key insight now is that pop and push are inverses of each other. If I push an element onto a stack and pop it again, I get back my element and the original stack. If I pop an element from a stack and push it back on, I get back my original stack. Extending these functions X × S + I ensures that this holds true even for the edge cases.

But if push and pop are inverses then X × S + I and S must essentially be the same—mathematically they are isomorphic. This is where the magic begins. As Bob said in is lecture, “let’s be bold like Gauss“, and he proceeded with the following calculation:

X × S + I = S

I = S – X × S = S × (I – X)

S = I / (I – X)

and so

S = I + X + X² + X³ + …

The last couple of steps are the bold ones, but actually make sense. The last equation basically says that a stack is either an empty stack, a single element of X, an ordered pair of elements of X, an ordered triple of elements of X and so on.

I’d known since high school that 1/(1-x) could be expanded to 1 + x + x² + x³ + …, but applying this to a data structure like stacks was uncharted territory. I was hooked, and the following year I had the privilege of working with Bob on my honours thesis and that work ultimately made it into a couple of joint papers with Bob.

I haven’t seen Bob face to face for many years now, but we have occasionally kept in touch through topics of mutual interest on our blogs. While I have not kept up with his academic work, I have always considered him more than just a brilliant mathematician. He was a creative, inspirational, radical thinker who had an enormous influence on me and, I know, many others.

RFC Walters, rest in peace.

Sleeping Beauty – a “halfer” approach

58 Replies

If you read the last post on the Sleeping Beauty problem, you may recall I did not pledge allegiance to either the “halfer” or the “thirder” camp, because I was still thinking my position through. More than a month later, I still can’t say I am satisfied. Mathematically, the thirder position seems to be the most coherent, but intuitively, it doesn’t seem quite right.

Mathematically the thirder position works well because it is the same as a simpler problem. Imagine the director of the research lab drops in to see how things are going. The director knows all of the details of the Sleeping Beauty experiment, but does not know whether today is day one or two of the experiment. Looking in, she sees Sleeping Beauty awake. To what degree should she believe that the coin toss was Heads? Here there is no memory-wiping and the problem fits neatly into standard applications of probability and the answer is 1/3.

My intuitive difficulty with the thirder is better expressed with a more extreme version of the Sleeping Beauty problem. Instead of flipping the coin once, the experimenters flip the coin 19 times. If there are 19 tails in a row (which has a probability of 1 in 524,288), Sleeping Beauty will be woken 1 million times. Otherwise (i.e. if there was at least one Heads tossed), she will only be woken once. Following the standard argument of the thirders, when Sleeping Beauty is awoken and asked for her degree of belief that the coin tosses turned up at least one Heads, she should say approximately 1/3 (or more precisely, 524287/1524287). Intuitively, this doesn’t seem right. Notwithstanding the potential for 1 million awakenings, I would find it hard to bet against something that started off as a 524287/524288 chance. Surely when Sleeping Beauty wakes up, she would be quite confident that at least one Heads came up and she is in the single awakening scenario.

Despite the concerns my intuition throws up, the typical thirder argues that Sleeping Beauty should assign 1/3 to Heads on the basis that she and the director have identical information. For example, here is an excerpt from a comment by RSM on the original post:

I want to know if halfers believe that two people with identical information about a problem, and with an identical set of priors, should assign identical probabilities to a hypothesis. I see the following possibilities:

The answer is no -> could be a halfer (but not necessarily).

The answer is yes, but the person holds that conditionalization is not a valid procedure –> could be a halfer.

The answer is yes and the person accepts conditionalization, but does not accept that the priors for the four possibilities in the Sleeping Beauty puzzle should be equal –> could be a halfer.

Otherwise, must be a thirder.

My intuition suggests, in a way I struggle to make precise, that Sleeping Beauty and the director do not in fact have identical information. All I can say is that Sleeping Beauty knows she will be awake on Monday (even if she subsequently forgets the experience), but the director may not observe Sleeping Beauty on Monday at all.

Nevertheless, option 2 raises interesting possibilities, on that have been explored in a number of papers. For example in D.J. Bradley’s “Self-location is no problem for conditionalization“, Synthese 182, 393–411 (2011), it is argued that learning about temporal information involves “belief mutation”, which requires a different approach to updating beliefs than “discovery” of non-temporal information, which makes use of conditionalisation.

All of this serves as a somewhat lengthy introduction to an interesting approach to the problem developed by Giulio Katis, who first introduced me to the problem. The Stubborn Mule may not be a well-known mathematical imprint, but I am pleased to be able to publish his paper, Sleeping Beauty, the probability of an experiment being in a state, and composing experiments, here on this site. In this post I will include excerpts from the paper, but encourage those interested in a mathematical framing of a halfer’s approach to the problem. I am sure that Giulio will welcome comments on the paper.

Giulio begins:

The view taken in this note is that the contention between halfers and thirders over the Sleeping Beauty (SB) problem arises primarily for two reasons. The first reason relates to exactly what experiment or frame of reference is being considered: the perspective of SB inside the experiment, or the perspective of an external observer who chooses to randomly inspect the state of the experiment. The second reason is that confusion persists because most thirders and halfers have not explicitly described their approach in terms of generally defining a concept such as “the probability of an experiment being in a state satisfying a property P conditional on the state satisfying property C”.

Here Giulio harks back to Bob Walters’ distinction between experiments and states. In the context of the Sleeping Beauty problem, the “experiment” is a full run from coin toss, through Monday and Tuesday, states are a particular point in the experiment and as an example, P could be a state with the coin toss being Heads and C being a state in which Sleeping Beauty is awake.

From here, Giulio goes on to describe two possible “probability” calculations. The first would be familiar to thirders and Giulio notes:

What thirders appear to be calculating is the probability that an external observer randomly inspecting the state of an experiment finds the state to be satisfying P . Indeed, someone coming to randomly inspect this modified SB problem (not knowing on what day it started) is twice as likely to find the experiment in the case where tails was tossed. This reflects the fact that the reference frame or ‘timeframe’ of this external observer is different to that of (or, shall we say, to that ‘inside’) the experiment they have come to observe. To formally model this situation would seem to require modelling an experiment being run within another experiment.

The halfer approach is then characterised as follows:

The halfers are effectively calculating as follows: first calculate for each complete behaviour of the experiment the probability that the behaviour is in a state satisfying property P; and then take the expected value of this quantity with respect to the probability measure on the space of behaviours of the experiment. Denote this quantity by Π_X(P) .

An interesting observation about this definition follows:

Note that even though at the level of each behaviour the ‘probability of being in a state satisfying P’ is a genuine probability measure, the quantity Π_X(P) is not in general a probability measure on the set of states of X . Rather, it is an expected value of such probabilities. Mathematically, it fails in general to be a probability measure because the normalization denominators n(p) may vary for each path. Even though this is technically not a probability measure, I will, perhaps wrongly, continue to call Π_X(P) a probability.

I think that this is an important observation. As I noted at the outset, the mathematics of the thirder position “works”, but typically halfers end up facing all sorts of nasty side-effects. For example, an incautious halfer may be forced to conclude that, if the experimenters tell Sleeping Beauty that today is Monday then she should update her degree of belief that the coin toss came up Heads to 2/3. In the literature there are some highly inelegant attempts to avoid these kinds of conclusions. Giulio’s avoids these issues by embracing the idea that, for the Sleeping Beauty problem, something other than a probability measure may be more appropriate for modelling “credence”:

I should say at this point that, even though Π_X(P) is not technically a probability, I am a halfer in that I believe it is the right quantity SB needs to calculate to inform her degree of ‘credence’ in being in a state where heads had been tossed. It does not seem Ξ_X(P) [the thirders probability] reflects the temporal or behavioural properties of the experiment. To see this, imagine a mild modification of the SB experiment (one where the institute in which the experiment is carried out is under cost pressures): if Heads is tossed then the experiment ends after the Monday (so the bed may now be used for some other experiment on the Tuesday). This experiment now runs for one day less if Heads was tossed. There are two behaviours of the experiment: one we denote by p_Tails which involves passing through two states S₁ = (Mon, Tails), S₂ = (Tue, Tails) ; and the other we denote by p_Heads which involves passing through one state S₃ = (Mon,Heads). Let P = {S₃}, which corresponds to the behaviour pHeads . That is, to say the experiment is in P is the same as saying it is is in the behaviour p_Heads. Note π(p_Heads) = 1/2 , but Ξ_X(P) = 1/3 . So the thirders view is that the probability of the experiment being in the state corresponding to the behaviour p_Heads (i.e. the probability of the experiment being in the behaviour p_Heads) is actually different to the probability of p_Heads occurring!

This halfer “probability” has some interesting characteristics:

There are some consequences of the definition for Π_X(P) above that relate to what some thirders claim are inconsistencies in the halfers’ position (to do with conditioning). In fact, in the context of calculating such probabilities, a form of ‘interference’ can arise for the series composite of two experiments (i.e. the experiment constructed as ‘first do experiment 1, then do experiment 2’), which does not arise for the probabilistic join of two experiments (i.e. the experiment constructed as ‘with probability p do experiment 1, with probability 1-p do experiment 2’).

…

In a purely formal manner (and, of course, not in a deeper physical sense) this ‘nonlocality’, and the importance of defining the starting and ending states of an experiment when calculating probabilities, reminds me of the interference of quantum mechanical experiments (as, say, described by Feynman in the gem of a book QED). I have no idea if this formal similarity has any significance at all or is completely superficial.

Giulio goes on to make an interesting conjecture about composition of Sleeping Beauty experiments:

We could describe this limiting case of a composite experiment as follows. You wake up in a room with a white glow. A voice speaks to you. “You have died, and you are now in eternity. Since you spent so much of your life thinking about probability puzzles, I have decided you will spend eternity mostly asleep and only be awoken in the following situations. Every Sunday I will toss a fair coin. If the toss is tails, I will wake you only on Monday and on Tuesday that week. If the toss is heads, I will only wake you on Monday that week. When you are awoken, I will say exactly the same words to you, namely what I am saying now. Shortly after I have finished speaking to you, I will put you back to sleep and erase the memory of your waking time.” The voice stops. Despite your sins, you can’t help yourself, and in the few moments you have before being put back to sleep you try to work out the probability that the last toss was heads. What do you decide it is?

In this limit, Giulio argues that a halfer progresses to the thirder position, assigning 1/3 to the probability that the last toss was heads!

These brief excerpts don’t do full justice to the framework Giulio has developed, but I do consider it a serious attempt to encompass all of the temporal/non-temporal, in-experiment/out-of-experiment subtleties that the Sleeping Beauty problem throws up. This paper is only for the mathematically inclined and, like so much written on this subject, I doubt it will convince many thirders, but if nothing else I hope it will put Giulio’s mind at rest having the paper published here on the Mule. Over recent weeks, his thoughts have been as plagued by this problem as have mine.

Sleeping Beauty

29 Replies

For the last couple of weeks, I have fallen asleep thinking about Sleeping Beauty. Not the heroine of the Charles Perrault fairy tale, or her Disney descendant, but the subject of a thought experiment first described in print by philosopher Adam Elga as follows:

Some researchers are going to put you to sleep. During the two days that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a fair coin (Heads: once; Tails: twice). After each waking, they will put you to back to sleep with a drug that makes you forget that waking. When you are first awakened, to what degree ought you believe that the outcome of the coin toss is Heads?

Elga, A. “Self‐locating belief and the Sleeping Beauty problem”, Analysis 60, 143–147 (2000)

It has become traditional to add that Sleeping Beauty is initially put to sleep on Sunday and is either woken up on Monday (Heads) or Monday and Tuesday (Tails). Then on Wednesday she is woken for the final time and the experiment is over. She knows in advance exactly what is going to take place, believes the experimenters and trusts that the coin is fair.

Much like the Monty Hall problem, Sleeping Beauty has stirred enormous controversy. There are two primary schools of thought on this problem. The thirders and the halfers. Both sides have a broad range of arguments, but put simply they are as follows.

Halfers argue that the answer is 1/2. On Sunday Sleeping Beauty believed that the chance of Heads was 1/2, she has learned nothing new when waking and so the chances are still 1/2.

Thirders argue that the answer is 1/3. If the experiment is repeated over and over again, approximately 1/3 of the time she will wake up after Heads and 2/3 of the time she will wake up after tails.

I first came across this problem myself when a friend alerted me to a blog post by my former supervisor Bob Walters, who describes the thirder position as an “egregious error”. But as Bob notes, there are many in the thirder camp, including Adam Elga himself, physicist Sean Carroll and statistician Jeffrey Rosenthal.

As for my own view, I will leave you in suspense for now, mainly because I’m still thinking it through. Although superficially similar, I believe that it is a far more subtle problem than the Monty Hall problem and poses challenges to what it means to move the pure mathematical theory of probability to a real world setting. Philosophers distinguish between the mathematical concept of “probability” and real world “credence”, a Bayesian style application of probability to real world beliefs. I used to think that this was a bit fanciful on the part of philosophers. Now I am not sure sure: applying probability is harder than it looks.

Let me know what you think!

Image Credit: Serena-Kenobi

Randomness revisited (mathsy)

2 Replies

My recent randomness post hinged on people’s expectations of how long a run of heads or tails you can expect to see in a series of coin tosses. In the post, I suggested that people tend to underestimate the length of runs, but what does the ~~fox~~ maths say? The exploration of the numbers in this post draws on the excellent 1991 paper “The Longest Run of Heads” by Mark Schilling, which would be a good starting point for further reading for the mathematically inclined.. When I ran the experiment with the kids, I asked them to try to simulate 100 coin tosses, writing down a sequence of heads and tails. Their longest sequence was 5 heads, but on average, for 100 tosses, the length of the longest run (which can be either heads or tails) is 7. Not surprisingly, this figure increases for a longer sequence of coin tosses. What might be a bit more surprising is how slowly the length of longest run grows. Just to bump up the average length from 7 to 8, the number of tosses has to increase from 100 to 200. It turns out that the average length of the longest run grows approximately logarithmically with the total number of tosses. This formula gives a pretty decent approximation of the expected length:

average length of longest run in n tosses ≃ log₂n + 1/3

The larger the value of n, the better the approximation and once n reaches 20, the error falls below 0.1%.

Growth of the Longest Run

However, averages (or, technically, expected values) like this should be used with caution. While the average length of the longest run seen in 100 coin tosses is 7, that does not mean that the longest run will typically have length 7. The probability distribution of the length of the longest run is quite skewed, as is evident in the chart below. The most likely length for the longest run is 6, but there is always a chance of getting a much longer run (more so than very short runs, which can’t fall below 1) and this pushes up the average length of the longest run.

Distribution of the Longest Run in 100 coin tosses

What the chart also shows is that the chance of the longest run only being 1, 2 or 3 heads or tails long is negligible (less than 0.03%). Even going up to runs of up to 4 heads or tails adds less than 3% to the cumulative probability. So, the probability that the longest run has length at least 5 is a little over 97%. If you ever try the coin toss simulation experiment yourself and you see a supposed simulation which does not have a run of at least 5, it’s a good bet that it was the work of a human rather than random coin. Like the average length of the longest run, this probability distribution shifts (approximately) logarithmically as the number of coin tosses increases. With a sequence of 200 coin tosses, the average length of the longest run is 8, the most likely length for the longest run is 7 and the chances of seeing a run of at least 5 heads or tails in a row is now over 99.9%. If your experimental subjects have the patience, asking them to simulate 200 coin tosses makes for even safer ground for you to prove your randomness detection skills.

Distribution of the Longest Run in 200 coin tosses

What about even longer runs? The chart below shows how the chances of getting runs of a given minimum length increase with the length of the coin toss sequence. As we’ve already seen, the chances of seeing a run of at least 5 gets high very quickly, but you have to work harder to see longer runs. In 100 coin tosses, the probability that the longest run has length at least 8 is a little below 1/3 and is still only just over 1/2 in 200 tosses. Even in a sequence of 200 coin tosses, the chances of seeing at least 10 heads or tails in a row is only 17%.

Longest Run probabilities

Getting back to the results of the experiment I conducted with the kids, the longest run for both the real coin toss sequence and the one created by the children was 5 heads. So, none of the results here could help to distinguish them. Instead, I counted the number of “long” runs. Keeping the distribution of long runs for 100 tosses in mind, I took “long” to be any run of 4 or more heads or tails. To calculate the probability distribution for “long” runs, I used simulation*, generating 100,000 separate samples of a 100 coin toss sequences. The chart below shows the results, giving an empirical estimate of the probability distribution for the number of runs of 4 or more heads or tails in a sequence of 100 coin tosses. The probability of seeing no more than two of these “long” runs is only 2%, while the probability of seeing 5 or more is 81%.

These results provide the ammunition for uncovering the kids’ deceptions. Quoting from the Randomness post:

One of the sheets had three runs of 5 in a row and two runs of 4, while the other had only one run of 5 and one run of 4.

So, one of the sheets was in the 81% bucket and one in the 2% bucket. I guessed that the former was the record of coin tosses and the second was devised by the children. That guess turned out to be correct and my reputation as an omniscient father was preserved! For now.

If you have made it this far, I would encourage you to do the following things (particularly the first one):

Listen to Stochasticity, possibly the best episode of the excellent Radiolab podcast, which features the coin toss challenge
Try the experiment on your own family or friends (looking for at least 3 runs of 5 or more heads or tails and ideally at least one of 6 or more)
Share your results in the comments below.

I look forward to hearing about any results.

* UPDATE: I subsequently did the exactly calculations, which confirmed that these simulated results were quite accurate.

Randomness

4 Replies

With three children, I have my own laboratory at home for performing psychological experiments. Before anyone calls social services, there is an ethical committee standing by (their mother).

This evening, I tried out one of my favourites: testing the perception of randomness. Here is the setup: I gave the boys two pieces of paper and a 20 cent coin. I was to leave the room, then they had to decide which of the two sheets of paper would be decided by the boys and which by a coin. Having made their choice, they then had to write down on one of the sheets their best attempt at a “random” sequence of 100 heads (H) and tails (T). Having done that, they were then to toss the coin 100 times, writing down on the other page the sequence of heads and tails that came up. I would then return to the room and guess which one was determined by the toss of the coin, and which by the boys.

I identified which sequence was which in less than 30 seconds. How did I do it?

The trick is to look for the longer sequences. Much like the gambler at the roulette wheel, the kids assume that a run of heads cannot last too long. One of the sheets had three runs of 5 in a row and two runs of 4, while the other had only one run of 5 and one run of 4. I correctly picked that the sheet with more long runs was determined by the coin toss.

Try it yourself sometime. If you see a run of 6 or more (which is in fact quite probable in a sequence of 100 coin tosses), you can quite confidently pick that as the coin toss, unless your subject has been well schooled in probability.

Our intuition struggles with randomness. We tend to assume randomness is more regular than it is. On the other hand, we also try to find patterns where there is only randomness, whether it is the man in the moon, clouds that look like things, the face of Mary on a piece of toast or, perhaps, an explanation for the disappearance of MH 370.

Benford’s Law

24 Replies

Here is a quick quiz. If you visit the Wikipedia page List of countries by GDP, you will find three lists ranking the countries of the world in terms of their Gross Domestic Product (GDP), each list corresponding to a different source of the data. If you pick the list according to the CIA (let’s face it, the CIA just sounds more exciting than the IMF or the World Bank), you should have a list of figures (denominated in US dollars) for 216 countries. Ignore the fact that the European Union is in the list along with the individual counties, and think about the first digit of each of the GDP values. What proportion of the data points start with 1? How about 2? Or 3 through to 9?

If you think they would all be about the same, you have not come across Benford’s Law. In fact, far more of the national GDP figures start with 1 than any other digit and fewer start with 9 than any other digit. The columns in the chart below shows the distribution of the leading digits (I will explain the dots and bars in a moment).

Distribution of leading digits of GDP for 216 countries (in US$)

This phenomenon is not unique to GDP. Indeed a 1937 paper described a similar pattern of leading digit frequencies across a baffling array of measurements, including areas of rivers, street addresses of “American men of Science” and numbers appearing in front-page newspaper stories. The paper was titled “The Law of Anomalous Numbers” and was written by Frank Benford, who thereby gave his name to the phenomenon.

Benford’s Law of Anomalous Numbers states that that for many datasets, the proportion of data points with leading digit n will be approximated by

log₁₀(n+1) – log₁₀(n).

So, around 30.1% of the data should start with a 1, while only around 4.6% should start with a 9. The horizontal lines in the chart above show these theoretical proportions. It would appear that the GDP data features more leading 2s and fewer leading 3s than Benford’s Law would predict, but it is a relatively small sample of data, so some variation from the theoretical distribution should be expected.

As a variation of the usual tests of Benford’s Law, I thought I would choose a rather modern data set to test it on: Twitter follower numbers. Fortunately, there is an R package perfectly suited to this task: twitteR. With twitteR installed, I looked at all of the twitter users who follow @stubbornmule and recorded how many users follow each of them. With only a relatively small follower base, this gave me a set of 342 data points which follows Benford’s Law remarkably well.

;

Distribution of leading digits of follower counts

As a measure of how well the data follows Benford’s Law, I have adopted the approach described by Rachel Fewster in her excellent paper A Simple Explanation of Benford’s Law. For the statistically-minded, this involves defining a chi-squared statistic which measures “badness” of Benford fit. This statistic provides a “p value” which you can think of as the probability that Benford’s Law could produce a distribution that looks like your data set. The follower-count for @stubbornmule is a very high 0.97, which shows a very good fit to the law. By way of contrast, if those 342 data points had a uniform distribution of leading digits, the p value would be less than 10^-15, which would be a convincing violation of Benford’s Law.

Since so many data sets do follow Benford’s Law, this kind of statistical analysis has been used to detect fraud. If you were a budding Enron-style accountant set on falsifying your company’s accounts, you may not be aware of Benford’s Law. As a result, you may end up inventing too many figures starting with 9 and not enough starting with 1. Exactly this style of analysis is described in the 2004 paper The Effective Use of Benford’s Law to Assist in Detecting Fraud in Accounting Data by Durtshi, Hillison and Pacini.

By this point, you are probably asking one question: why does it work? It is an excellent question, and a surprisingly difficult and somewhat controversial one. At current count, an online bibliography of papers on Benford Law lists 657 papers on the subject. For me, the best explanation is Fewster’s “simple explanation” which is based her “Law of the Stripey Hat”. However simple it may be, it warrants a blog post of its own, so I will be keeping you in suspense a little longer. In the process, I will also explain some circumstances in which you should not expect Benford’s Law to hold (as an example, think about phone numbers in a telephone book).

In the meantime, having gone to the trouble of adapting Fewster’s R Code to produce charts testing how closely twitter follower counts fit Benford’s Law, I feel I should share a few more examples. My personal twitter account, @seancarmody, has more followers than @stubbornmule and the pattern of leading digits in my followers’ follower counts also provides a good illustration of Benford’s Law.

One of my twitter friends, @stilgherrian, has even more followers than I do and so provides an even larger data set.

Even though the bars seem to follow the Benford pattern quite well here, the p value is a rather low 5.5%. This reflects the fact that the larger the sample, the closer the fit should be to the theoretical frequencies if the data set really follows Benford’s Law. This result appears to be largely due to more leading 1s than expected and fewer leading 2s. To get a better idea of what is happening to the follower counts of stilgherrian’s followers, below is a density* histogram of the follower counts on a log₁₀ scale.

There are a few things we can glean from this chart. First, the spike at zero represents accounts with only a single follower, accounting around 1% of stilgherrian’s followers (since we are working on a log scale, the followers with no followers of their own do not appear on the chart at all). Most of the data is in the range 2 (accounts with 100 followers) to 3 (accounts with 1000 followers). Between 3 and 4 (10,000 followers), the distribution falls of rapidly. This suggests that the deviation from Benford’s Law is due to a fair number users with a follower count in the 1000-1999 range (I am one of those myself), but a shortage in the 2000-2999 range. Beyond that, the number of data points becomes too small to have much of an effect.

Histogram of follower counts of @stilgherrian’s followers

Of course, the point of this analysis is not to suggest that there is anything particularly meaningful about the follower counts of twitter users, but to highlight the fact that even the most peculiar of data sets found “in nature” is likely to yield to the power of Benford’s Law.

* A density histogram scales the vertical axis to ensure that the histogram covers a region of area one rather than the frequency of occurrences in each bin.

More colour wheels

2 Replies

In response to my post about colour wheels, I received a suggested enhancement from Drew. The idea is to first match colours based on the text provided and then add nearby colours. This can be done by ordering colours in terms of hue, saturation, and value. The result is a significant improvement and it will capture all of those colours with more obscure names.

Here is my variant of Drew’s function:

col.wheel <- function(str, nearby=3, cex=0.75) {
	cols <- colors()
	hsvs <- rgb2hsv(col2rgb(cols))
	srt <- order(hsvs[1,], hsvs[2,], hsvs[3,])
	cols <- cols[srt]
	ind <- grep(str, cols)
	if (length(ind) <1) stop("no colour matches found",
		call.=FALSE)
	s.ind <- ind
	if (nearby>1) for (i in 1:nearby) {
		s.ind <- c(s.ind, ind+i, ind-i)
	}
	ind <- sort(unique(s.ind))
	ind <- ind[ind <= length(cols)]
	cols <- cols[ind]
	pie(rep(1, length(cols)), labels=cols, col=cols, cex=cex)
	cols
}

I have included an additional parameter, nearby, which specifies the range of additional colours to include. A setting of 1 will include colours matching the specified string and also one colour on either side of each of these. By default, nearby is set to 3.

The wheel below shows the results for col.wheel(“thistle”, nearby=5). As well as the various shades of “thistle”, this also uncovers “plum” and “orchid”.

This is far more powerful than the original function: thanks Drew.

Colour wheels in R

11 Replies

Regular readers will know I use the R package to produce most of the charts that appear here on the blog. Being more quantitative than artistic, I find choosing colours for the charts to be one of the trickiest tasks when designing a chart, particularly as R has so many colours to choose from.

In R, colours are specified by name, with names ranging from the obvious “red”, “green” and “blue” to the obscure “mintycream”, “moccasin” and “goldenrod”. The full list of 657 named colours can be found using the colors() function, but that is a long list to wade through if you just want to get exactly the right shade of green, so I have come up with a shortcut which I thought I would share here*.

Below is a simple function called col.wheel which will display a colour wheel of all the colours matching a specified keyword.

col.wheel <- function(str, cex=0.75) {
	cols <- colors()[grep(str, colors())]
	pie(rep(1, length(cols)), labels=cols, col=cols, cex=cex)
	cols
	}

To use the function, simply pass it a string:

col.wheel("rod")

As well as displaying the colour wheel below, this will return a list of all of the colour names which include the specified string.

 [1] "darkgoldenrod"        "darkgoldenrod1"       "darkgoldenrod2"
 [4] "darkgoldenrod3"       "darkgoldenrod4"       "goldenrod"
 [7] "goldenrod1"           "goldenrod2"           "goldenrod3"
[10] "goldenrod4"           "lightgoldenrod"       "lightgoldenrod1"
[13] "lightgoldenrod2"      "lightgoldenrod3"      "lightgoldenrod4"
[16] "lightgoldenrodyellow" "palegoldenrod"

In fact, col.wheel will accept a regular expression, so you could get fancy and ask for colours matching “r.d” which will give you all the reds and the goldenrods.

This trick does have its limitations: you are not likely to find “thistle”, “orchid” or “cornsilk” this way, but I have found it quite handy and others may too.

*My tip was inspired by this page about R colours.

Natural frequencies

2 Replies

In my last post, I made a passing reference to Gerd Gigerenzer’s idea of using “natural frequencies” instead of probabilities to make assessing risks a little easier. My brief description of the idea did not really do justice to it, so here I will briefly outline an example from Gigerenzer’s book Reckoning With Risk.

The scenario posed is that you are conducting breast cancer screens using mammograms and you are presented with the following information and question about asymptomatic women between 40 and 50 who participate in the screening:

The probability that one of these women has breast cancer is 0.8%. If a woman has breast cancer, the probability is 90% that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 7% that she will still have a positive mammogram. Imagine a woman who has a positive mammogram. What is the probability that she actually has breast cancer?

For those familiar with probability, this is a classic example of a problem that calls for the application of Bayes’ Theorem. However, for many people—not least doctors—it is not an easy question.

Gigerenzer posed exactly this problem to 24 German physicians with an average of 14 years professional experience, including radiologists, gynacologists and dermatologists. By far the most common answer was that there was a 90% chance she had breast cancer and the majority put the odds at 50% or more.

In fact, the correct answer is only 9% (rounding to the nearest %). Only two of the doctors came up with the correct answer, although two others were very close. Overall, a “success” rate of less than 20% is quite striking, particularly given that one would expect doctors to be dealing with these sorts of risk assessments on a regular basis.

Gigerenzer’s hypothesis was that an alternative formulation would make the problem more accessible. So, he posed essentially the same question to a different set of 24 physicians (from a similar range of specialties with similar experience) in the following way:

Eight out of every 1,000 women have breast cancer. Of these 8 women with breast cancer, 7 will have a positive mammogram. Of the remaining 992 women who don’t have breast cancer, some 70 will still have a positive mammogram. Imagine a sample of women who have positive mammograms in screening. How many of these women actually have breast cancer?

Gigerenzer refers to this type of formulation as using “natural frequencies” rather than probabilities. Astute observers will note that there are some rounding differences between this question and the original one (e.g. 70 out of 992 false positives is actually a rate of 7.06% not 7%), but the differences are small.

Now a bit of work has already been done here to help you on the way to the right answer. It’s not too hard to see that there will be 77 positive mammograms (7 true positives plus 70 false positives) and of these only 7 actually have breast cancer. So, the chances of someone in this sample of positive screens actually having cancer is 7/77 = 9% (rounding to the nearest %).

Needless to say, far more of the doctors who were given this formulation got the right answer. There were still some errors, but this time only 5 of the 24 picked a number over 50% (what were they thinking?).

The lesson is that probability is a powerful but confusing tool and it pays to think carefully about how to frame statements about risk if you want people to draw accurate conclusions.