Deceptive Charts #2

Last month I wrote about the dangers of secondary axes, but even charts with a single axis can be deceiving. I have been reflecting on this after reading Jon Peltier’s critique of Microsoft’s “professional” charting tutorials earlier this week. One of the charts Peltier takes issue with is a column chart which has the value axis starting at 100 rather than zero. He writes:

This is a major chart fail. The value axis on a column or bar chart should always include zero. Always. If you want to expand the scale to help resolve the values, then a column chart is not the right chart type.

In today’s Silicon Alley Insider chart of the day I saw exactly the same chart fail. The chart, which I have reproduced below, shows the ranking of various magazines and newspapers by the wealth of their readers (the data is taken from a report by B to B Media Business).

Median Income of Readers – Silicon Alley Version

This chart may do a good job of highlighting the Wall Street Journal leading position, thereby supporting Silicon Alley’s headline “The Journal Has The Richest Readership Among Print Pubs”. But it also gives a distorted impression of just how solid the Journal’s lead is. Starting the income axis at zero, shown in the chart below, gives a rather different impression. The Wall Street Journal still sits at the top, but the variation across the titles is much less significant than the original chart suggested.

Median Income of Readers – Zero-based Version

Nevertheless, precisely because it displays less variation in the data, the zero-based chart does seem less useful and it is harder to read the values. Commenting on Peltier’s post and musing on my posterous Extras blog, I wondered whether starting axes with zero should be considered an inviolable rule of charting. One of the gurus of data visualisation is William S. Cleveland. In his book “The Elements of Graphing Data” he gives this advice: “Do not insist that zero always be included on a scale showing magnitude”. He goes on to make this argument:

For graphical communication in science and technology assume the viewer will look at the tick mark labels and understand them. Were we not able to make this assumption, graphical communication would be far less useful. If zero can be included on a scale without wasting undue space, then it is reasonable to include it, but never at the expense of resolution.

At first glance this would seem to get the Silicon Alley Insider out of chart jail. But the story does not end there. Cleveland’s book focuses on scientific charts, particularly line and scatterplots (also known as X-Y plots) and there is scarcely a bar or column chart to be found. Furthermore, in his paper “Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging” (The American Statistician, November 1984), he makes the following observations:

The bar of a bar chart has two aspects that can be used to visually decode quantitative information—size (length and area) and the relative position of the end of the bar along the common scale. The changing sizes of the bars is an important and imposing visual factor; thus it is important that size encode something meaningful. The sizes of bars encode the magnitudes of deviations from the baseline. If the deviations have no important interpretation, the changing sizes are wasted energy and even have the potential to mislead (Schmid 1983).

Cleveland’s solution to showing data variation without having bar lengths deceive was to invent a new type of chart: the “dot plot”. Dot plots, which I have used here on the Stubborn Mule to illustrate statistics on asylum-seekers and universities, use position alone to encode the data. This means that it is much safer to drop zero from the axis. Although rather tricky to produce using Microsoft Excel (I use the R package), they are a good substitute for bar and column charts. This BeyeNetwork article goes into more detail about dot plots, including the use of multi-panel plots, which I will look at in a future post.

So, here is a dot plot version of the newspaper and magazine rankings by reader income.

Median Income of Readers – Dot Plot Version

Now you know to be vigilant against the deceptive use of axis scales.

Possibly Related Posts (automatically generated):

Bubbles to Brains (12 October 2010)
Junk Charts: Secondary Axes (6 October 2009)
Graphing using R (17 May 2010)
Love is Old-Fashioned, Sex Less So (24 July 2009)

4 thoughts on “Deceptive Charts #2”

mark 20 November 2009 at 5:36 pm

Really nice demonstration of the reasoning behind dot plots. It’s quite impressive how well Schmid’s theory plays out; I find it much easier to follow the curve of the dots than the end point of the bars, even compared to the first graph.
gerd schenkel 22 November 2009 at 10:31 pm

ha ha ! here’s the facebook group just for this http://www.facebook.com/group.php?gid=50953581261&ref=ts
CV 28 November 2009 at 7:57 pm

Keep the nuggets (of gold) coming Mule. Very insightful and interesting to read.
Another Anonymous Coward 30 November 2009 at 3:05 pm

Even the Beeb is getting in on the act discussing charts and graphs…
http://news.bbc.co.uk/2/hi/uk_news/magazine/8381597.stm

Stubborn Mule

Obstinately objective

Deceptive Charts #2

Possibly Related Posts (automatically generated):

Like this:

Related

4 thoughts on “Deceptive Charts #2”

Leave a Reply Cancel reply

Possibly Related Posts (automatically generated):

Share this post:

Like this:

Related

4 thoughts on “Deceptive Charts #2”

Leave a Reply Cancel reply