Last month I wrote about the dangers of secondary axes, but even charts with a single axis can be deceiving. I have been reflecting on this after reading Jon Peltier’s critique of Microsoft’s “professional” charting tutorials earlier this week. One of the charts Peltier takes issue with is a column chart which has the value axis starting at 100 rather than zero. He writes:
This is a major chart fail. The value axis on a column or bar chart should always include zero. Always. If you want to expand the scale to help resolve the values, then a column chart is not the right chart type.
This chart may do a good job of highlighting the Wall Street Journal leading position, thereby supporting Silicon Alley’s headline “The Journal Has The Richest Readership Among Print Pubs”. But it also gives a distorted impression of just how solid the Journal’s lead is. Starting the income axis at zero, shown in the chart below, gives a rather different impression. The Wall Street Journal still sits at the top, but the variation across the titles is much less significant than the original chart suggested.
Nevertheless, precisely because it displays less variation in the data, the zero-based chart does seem less useful and it is harder to read the values. Commenting on Peltier’s post and musing on my posterous Extras blog, I wondered whether starting axes with zero should be considered an inviolable rule of charting. One of the gurus of data visualisation is William S. Cleveland. In his book “The Elements of Graphing Data” he gives this advice: “Do not insist that zero always be included on a scale showing magnitude”. He goes on to make this argument:
At first glance this would seem to get the Silicon Alley Insider out of chart jail. But the story does not end there. Cleveland’s book focuses on scientific charts, particularly line and scatterplots (also known as X-Y plots) and there is scarcely a bar or column chart to be found. Furthermore, in his paper “Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging” (The American Statistician, November 1984), he makes the following observations:
Cleveland’s solution to showing data variation without having bar lengths deceive was to invent a new type of chart: the “dot plot”. Dot plots, which I have used here on the Stubborn Mule to illustrate statistics on asylum-seekers and universities, use position alone to encode the data. This means that it is much safer to drop zero from the axis. Although rather tricky to produce using Microsoft Excel (I use the R package), they are a good substitute for bar and column charts. This BeyeNetwork article goes into more detail about dot plots, including the use of multi-panel plots, which I will look at in a future post.
Now you know to be vigilant against the deceptive use of axis scales.
Possibly Related Posts (automatically generated):
- Bubbles to Brains (12 October 2010)
- Junk Charts: Secondary Axes (6 October 2009)
- Graphing using R (17 May 2010)
- Love is Old-Fashioned, Sex Less So (24 July 2009)
Really nice demonstration of the reasoning behind dot plots. It’s quite impressive how well Schmid’s theory plays out; I find it much easier to follow the curve of the dots than the end point of the bars, even compared to the first graph.
ha ha ! here’s the facebook group just for this http://www.facebook.com/group.php?gid=50953581261&ref=ts
Keep the nuggets (of gold) coming Mule. Very insightful and interesting to read.
Even the Beeb is getting in on the act discussing charts and graphs…