Update: regular commenter Mark L has helpfully identified they reason behind the apparent anomaly in the statistics that motivated me to write this post. I had misinterpreted one of the statistics. While this takes the mystery out of the numbers, it does highlight how tricky it can be to get to grips with the statistics of medical tests. I have edited this post to correct that misinterpretation. I decided not to use a the strikethough editing approach popular among bloggers as the content can be confusing enough already!
Despite the fact that more banks have been failing (Bradford & Bingley, Wachovia, Hypo Real Estate, Fortis,…), in this post I will continue to stay away from the subject of the financial markets and will instead look at some mathematics, trying not to lose too many readers in the process.
Recently I was contemplating results from an “NT-plus” test, which combines ultrasound measurements of the nuchal translucency with maternal blood-tests to provide a screening test for various chromosomal abnormalities, particularly Down Syndrome. Tests of this type abound with statistics and the mathematician in me could not resist crunching the numbers a little to get a better understanding of the test.
Although I did my calculations using Bayes’ Formula (see below), here I will try to avoid formulas and instead work with “natural frequencies”. This approach, used in Gerd Gigerenzer’s book Reckoning With Risk, amounts to the same thing but is generally easier to understand.
To begin, I will explain some of the medical testing terminology. I will stick to the example of the NT-plus test and will divide the population into four categories based on whether the fetus has Down Syndrome or not and whether the NT-plus test gives a “high risk” or “low risk” result. Note that the terms “high risk” and “low risk” are used because this test (like most medical tests) is not definitive. Neverthless, a high risk result is typically referred to as testing “positive” and low risk as testing “negative”.
- Population Rate: the overall rate of cases with Down Syndrome. For Down Syndrome, the population rate is somewhere between 1 in 800 and 1 in 1500. Here I will work with a rate of 1 in 1000.
- Detection Rate: the rate of Down Syndrome cases the test identified as “high risk”, also known as the “sensitivty” of the test. The NT-plus test has a detectction rate of approximately 90%.
- False Positive Rate: the rate of cases without Down Syndrome that identified as “high risk”. The false positive rate for the NT-plus is approximately 5%.
- High Risk Rate: the rate of cases classified as high risk that actually have Down Syndrome. This is not generally quoted.
- Cut-Off Risk: the threshold for a “high risk” result. The cut-off for a high risk result for the NT-plus test is quoted as 1 in 300.
If we know the four numbers a, b, c, d, we will be able to calculate most of these statistics for the NT-plus test. Furthermore, since we are only really interested in frequencies, the population size a+b+c+d is irrelevant and can be scaled up and down; there are really only three pieces of information that are required to know everything. Once we have three bits of information, we can work everything else out. This means that armed with the population rate, the detection rate and the false positive rate, we can determine the high risk rate.
To make the numbers work out nicely, I’ll use a hypothetical total population of 20,000 of which 20 have Down Syndrome (working with the population rate of 1/1000). Using the detection rate of 90%, these are split as follows:
Of the 19,980 who don’t have Down Syndrome, there is a false positive rate of 5%, which means that 5% x 19,980 = 999 have a high risk test result and the remaining 19,980-999 = 18,891 have a low risk result:
Now we can work out the cut-off risk, which is the only piece of information we have not yet used. The cut-off risk is the chance of having Down Syndrome given a high risk result and in this example is 18 / (18 + 999) = 1/56.5. Note that these odds are far shorter than the 1/300 cut-off risk. That is because 1/300 represents the Down Syndrome risk for cases that just fall into the high risk category. There would be some patients in the high risk group with a far higher risk of Down Syndrome, and the 1/56.5 high risk rate can be though of as the average of all of these. Initially I confused the cut-off risk and the high risk rate, and so was surprised by the 1/56.5 figure.
Unfortunately, calculating the cut-off risk is more complicated that the other statistics and cannot be done using this 2×2 matrix approach. It requires some information about the distribution of the test scores in the high risk group.
For anyone interested in experimenting with the numbers, I have published a Google Docs spreadsheet which uses Bayes Formula. Having got this far without a formula, for the enthusiasts I should say what Bayes’ Formula actually is. This result, named after the mathematician Thomas Bayes, provides a relationship between conditional probabilities such as “the chances of having a disease given a positive test result”:
Here the P( ) denotes “the probability of” (e.g. P(A) denotes the probability of event A occuring). The vertical bar is used to denote “conditional upon”, so P(A|B) denotes the probability of event A occurring given that event B has occurred. To interpret this in the case of the NT-plus test, take A to represent “the fetus has Down Syndrome”, A “the fetus does not have Down Syndrome” and B “a high-risk NT-plus test result”. Then P(A|B) is the cut-off risk (the probability of Down Syndrome given a positive test), P(A) is the population risk, P(B|A) is the detection rate and P(B|A ) is the false positive rate. Note that P(A ) = 1 – P(A).
UPDATE: the Sydney Ultrasound for Women (SUfW) clinic has published statistics from their NT-plus program. From 1996 to 2004 their program (which includes both NT-plus and, earlier in the program, NT alone) tested 48,265 women and identified 144 cases of Down Sydrome. In this sample 5.9% tested “high risk”, which amounts to 2,848 women. These figures suggest a high risk rate of 144/2,848 or approximately 1/20. This difference can be explained because the overall population rate of Down Syndrome in the SUfW sample was a fairly high 1 in 290, possibly reflecting a bias in the group to older women.