Visualizing smoking risk

Risk is something many people have a hard time thinking about clearly. Why is that? In his book Risk: The Science and Politics of Fear, subtitled “why we fear the things we shouldn’t–and put ourselves in greater danger”, Dan Gardner surveyed many of the theories that have been used to explain this phenomenon. They range from simple innumeracy, to the influence of the media, or even the psychology of the short-cut “heuristics” (rules of thumb) we all use to make decisions quickly but that can also lead us astray.

In Reckoning With Risk, Gerd Gigerenzer argues that the traditional formulation of probability is particularly unhelpful, making calculations even harder than they should be. Studies have shown that even doctors struggle to handle probabilities correctly when explaining risks associated with illnesses and treatments. Gigerenzer instead proposed expressing risk in terms of “natural frequencies” (e.g. thinking in terms of 8 patients out of 1,000 rather than a 0.8% probability) and tests with general practitioners suggest that this kind of re-framing can be very effective.

The latest book on the subject that I have been reading is The Illusion of Certainty: Health Benefits and Risks by Erik Rifkin and Edward Bouwer. Rifkin and Bouwer are particularly critical of the common practice of reporting medical risks in terms of relative rather than absolute frequencies. When news breaks that a new treatment reduces the risk of dying from condition X by 33%, should you be excited? That depends. This could mean that (absolute) risk of dying from X is currently 15% and the treatment brings this down to 10%. That would be big news. However, if the death rate from X is currently 3 in 10,000 and the treatment brings this down to 2 in 10,000 then the reduction in (relative) risk is still 33% but the news is far less exciting because the absolute risk of 3 in 10,000 is so much lower.

In an effort to facilitate the perception of risk, Rifkin and Bouwer devised an interesting graphical device. They note that it is particularly difficult to conceive and compare small risks, say a few cases in 1,000. In thinking about this problem, they came up with the idea of picturing a theatre with 1,000 seats and representing the cases as occupied seats in that theatre. They call the result a “Risk Characterization Theatre” (RCT). Here is an example to illustrate a 2% risk, or 20 cases in 1,000.

Now data visualization purists would be horrified by this picture. In The Visual Display of Quantitative Information, Edward Tufte argues that the “ink to data ratio” should be kept as low as possible, but the RCT uses a lot of ink just to display a single number! Still, I do think that the RCT can be an effective tool and perhaps this can be justified by thinking of it as a way of visualizing numbers rather than data (but maybe that’s a long bow).

Attractive though the theatre layout may be, there is probably no real need for the detail of the aisles, seating sections and labels, so here is a simpler version (again illustrating 20 in 1,000).

To illustrate the use of RCTs, I’ll use one of the case studies from Rifkin and Bouwer’s book: smoking. One of the most significant studies of the health effects of smoking tracked the mortality of almost 35,000 British doctors (a mix of smokers and non-smokers). The study commenced in 1951 and the first results were published in 1954 and indicated a significantly higher incidence of lung cancer among smokers. The study ultimately continued until 2001 and and the final results were published in the 2004 paper Mortality in relation to smoking: 50 years’ observations on male British doctors.

The data clearly showed that, on average, smokers died earlier than non-smokers. The chart below would be the traditional way of visualizing this effect*.

Survival of doctors born between 1900 and 1930

While it may be clear from this chart that being a smoker is riskier than being a non-smoker, thinking in terms of percentage survival rates may not be intuitive for everyone. Here is how the same data would be illustrated using RCTs. Appropriately, the black squares indicate a death (and for those who prefer the original layout, there is also a theatre version).

Mortality of doctors born between 1900 and 1930

This is a rather striking chart. Particularly looking at the theatres for doctors up to 70 and 80 years old, the higher death rate of smokers is stark. However, the charts also highlight the inefficiency of the RCT. This graphic in fact only shows 8 of the 12 data points on the original charts.

So, the Risk Characterization Theatre is an interesting idea that may be a useful tool for helping to make numbers more concrete, but they are unlikely to be added to the arsenal of the serious data analyst.

As a final twist of the RCT, I have also designed a “Risk Characterization Stadium” which could be used to visualize even lower risks. Here is an illustration of 20 cases in 10,000 (0.2%).

* Note that the figures here differ slightly from those in Rifkin and Bower’s book. I have used data for doctors born between 1900 and 1930, whereas they refer to the 1900-1909 data but would in fact appear to have used the 1910-1919 data.

Possibly Related Posts (automatically generated):

Shades of grey (23 October 2010)
Generate your own Risk Characterization Theatre (25 October 2010)
Micromorts (24 December 2010)
Natural frequencies (22 October 2010)

13 thoughts on “Visualizing smoking risk”

Danny Yee 22 October 2010 at 12:11 am

If you used colours, you could represent data for all the age group deaths on the one pair of figures – e.g. red for deaths before age 50, brown for deaths between 50 and 60, etc. (Or some other colour scheme as most effective.)

Or would this be too confusing?

Thomas 22 October 2010 at 4:55 am

This is quite an informative post as it touches upon my area of knowledge. A couple of random points:
-Expressing risk in terms of “natural frequencies” while appealing to many readers makes comparisons unintuitive and far from immediate. e.g. Study A indicates that the risk of developing disease is 8 patients out of 1000. Study B says it’s 1 out of 120 patients. Percentages or proportions help by providing a common scale that makes the comparison immediate
-One has to keep in mind the burden of a disease in the population and its prevalence. A small benefit on a rare disease is quite different than a small benefit on a more prevalent disease in terms of public policies, health economy etc.
-I am not entirely convinced by the RCT approach (interestingly RCT also stands for Randomized Control Trial…). It takes 8 panels to convey what a survival curve does effectively in one graph with two curves. And there is no quantification of survival, only a visual clue is provided by the filled squares.

Chris 22 October 2010 at 9:45 am

Well, I understand the convenience of the oval shape. I just hope it’s a cricket ground and not AFL. C

An Anonymous Zebra 22 October 2010 at 10:26 am

Great article: I have an idea for representing the smoking data that will leave you gasping Mule! A pie chart where time = radians and you randomly pick points along each spoke to represnt the relative number of smokers dying.Try and smooth it out.

Really its a pi-chart version of representing all your data on a single RCT box with time along the x-axis and randomly picking points on the y-axis to represent your data. This is instead of a straight relative frequency graph or bar chart. I wonder what it looks like? The additional detail is justified to draw the readers attention to the random nature of the outcome.

Stubborn Mule Post author22 October 2010 at 8:49 pm

@Danny: I have also been thinking about using a colouring scheme…stay tuned!

@Thomas: I did not really do justice to Gigerenzer’s notion in the brief comment in the post. As you say, to be useful the base needs to be the same when making comparisons. Of course, mathematically, expressing everything as, say, 10 in 1,000 or 50 in 1,000 rather than 1% and 5% makes no difference, but it seems to help when people actually need to calculate risks. I will post a brief example shortly.

Your point about the difference in impact of treatment for rare versus common diseases is a good one, and one which is lost when only relative risks are quoted as is so often the case. This is a point Rifkin and Bouwer make repeatedly in your book.

Like you, I have reservations about the inefficiency of the RCT. Nevertheless, watching the reactions of people seeing the graphics it does seem to communicate on more of a “gut level” than mere numbers or even a graph like the line graph. My conclusion is that they have their place, but should be used with care, taking into account the target audience. It would also be worth adding the numbers: you are right that it is not really possible to read them from the chart. Here is an alternative smoking chart which uses the “classic” theatre layout and adds the numbers.

@Chris: don’t expect me to tell the difference between a cricket stadium and an AFL stadium!

@James: interesting idea…I’ll have a crack at that too.

Stubborn Mule Post author22 October 2010 at 9:29 pm

@Thomas: here is a bit more about natural frequencies.

Pingback: Natural frequencies

Stubborn Mule Post author22 October 2010 at 11:07 pm

@Danny: here is a first cut of a colour-coded graphic. The idea has potential, but here it may be a little to confusing.

Pingback: Generate your own Risk Characterization Theatre

pfh007 26 October 2010 at 10:38 pm

The difficulty with using smoking as the example is that from personal experience I can attest that the effectiveness of the method of presenting smoking related data is no match for the seductive powers of nicotine addiction.

Fortunately, I was able to kick the habit before diseased body parts became compulsory viewing for the average smoker. That people will tolerate those images in pursuit of the self administration of nicotine suggests that the method of presenting risk information may not make a significant difference to them.

Also, considering the smoking study in question was of doctors who smoked it would be interesting to know whether the survival rates were communicated back to the subjects during the course of the study. If so the fact that some doctors continued to smoke up to and beyond the age of 80 is quite astounding.

When I was young our family GP puffed like a train during consults. I don’t think he made it to 80.

Pingback: Bicycle Parking At The Risk Obfuscation Theatre « Conflated Automatons

Pingback: Micromorts

Pingback: R-ohjelmointi.org » Blog Archive » Riskin visualisointi

Danny Yee 22 October 2010 at 12:11 am

If you used colours, you could represent data for all the age group deaths on the one pair of figures – e.g. red for deaths before age 50, brown for deaths between 50 and 60, etc. (Or some other colour scheme as most effective.)

Or would this be too confusing?
Thomas 22 October 2010 at 4:55 am

This is quite an informative post as it touches upon my area of knowledge. A couple of random points:
-Expressing risk in terms of “natural frequencies” while appealing to many readers makes comparisons unintuitive and far from immediate. e.g. Study A indicates that the risk of developing disease is 8 patients out of 1000. Study B says it’s 1 out of 120 patients. Percentages or proportions help by providing a common scale that makes the comparison immediate
-One has to keep in mind the burden of a disease in the population and its prevalence. A small benefit on a rare disease is quite different than a small benefit on a more prevalent disease in terms of public policies, health economy etc.
-I am not entirely convinced by the RCT approach (interestingly RCT also stands for Randomized Control Trial…). It takes 8 panels to convey what a survival curve does effectively in one graph with two curves. And there is no quantification of survival, only a visual clue is provided by the filled squares.
Chris 22 October 2010 at 9:45 am

Well, I understand the convenience of the oval shape. I just hope it’s a cricket ground and not AFL. C
An Anonymous Zebra 22 October 2010 at 10:26 am

Great article: I have an idea for representing the smoking data that will leave you gasping Mule! A pie chart where time = radians and you randomly pick points along each spoke to represnt the relative number of smokers dying.Try and smooth it out.

Really its a pi-chart version of representing all your data on a single RCT box with time along the x-axis and randomly picking points on the y-axis to represent your data. This is instead of a straight relative frequency graph or bar chart. I wonder what it looks like? The additional detail is justified to draw the readers attention to the random nature of the outcome.
Stubborn Mule Post author22 October 2010 at 8:49 pm

@Danny: I have also been thinking about using a colouring scheme…stay tuned!

@Thomas: I did not really do justice to Gigerenzer’s notion in the brief comment in the post. As you say, to be useful the base needs to be the same when making comparisons. Of course, mathematically, expressing everything as, say, 10 in 1,000 or 50 in 1,000 rather than 1% and 5% makes no difference, but it seems to help when people actually need to calculate risks. I will post a brief example shortly.

Your point about the difference in impact of treatment for rare versus common diseases is a good one, and one which is lost when only relative risks are quoted as is so often the case. This is a point Rifkin and Bouwer make repeatedly in your book.

Like you, I have reservations about the inefficiency of the RCT. Nevertheless, watching the reactions of people seeing the graphics it does seem to communicate on more of a “gut level” than mere numbers or even a graph like the line graph. My conclusion is that they have their place, but should be used with care, taking into account the target audience. It would also be worth adding the numbers: you are right that it is not really possible to read them from the chart. Here is an alternative smoking chart which uses the “classic” theatre layout and adds the numbers.

@Chris: don’t expect me to tell the difference between a cricket stadium and an AFL stadium!

@James: interesting idea…I’ll have a crack at that too.
Stubborn Mule Post author22 October 2010 at 9:29 pm

@Thomas: here is a bit more about natural frequencies.
Pingback: Natural frequencies
Stubborn Mule Post author22 October 2010 at 11:07 pm

@Danny: here is a first cut of a colour-coded graphic. The idea has potential, but here it may be a little to confusing.
Pingback: Generate your own Risk Characterization Theatre
pfh007 26 October 2010 at 10:38 pm

The difficulty with using smoking as the example is that from personal experience I can attest that the effectiveness of the method of presenting smoking related data is no match for the seductive powers of nicotine addiction.

Fortunately, I was able to kick the habit before diseased body parts became compulsory viewing for the average smoker. That people will tolerate those images in pursuit of the self administration of nicotine suggests that the method of presenting risk information may not make a significant difference to them.

Also, considering the smoking study in question was of doctors who smoked it would be interesting to know whether the survival rates were communicated back to the subjects during the course of the study. If so the fact that some doctors continued to smoke up to and beyond the age of 80 is quite astounding.

When I was young our family GP puffed like a train during consults. I don’t think he made it to 80.
Pingback: Bicycle Parking At The Risk Obfuscation Theatre « Conflated Automatons
Pingback: Micromorts
Pingback: R-ohjelmointi.org » Blog Archive » Riskin visualisointi

Stubborn Mule

Obstinately objective

Visualizing smoking risk

Possibly Related Posts (automatically generated):

Like this:

Related

13 thoughts on “Visualizing smoking risk”

Leave a Reply Cancel reply

Possibly Related Posts (automatically generated):

Share this post:

Like this:

Related

13 thoughts on “Visualizing smoking risk”

Leave a Reply Cancel reply