Category Archives: charts

Protovis now working in Chrome and Safari

Thanks to everyone who responded to my experimental Protovis post*, whether in the survey, via twitter or in comments on the post. It quickly became clear that my trick for including the code to generate the chart completely failed to work in Chrome and Safari browsers. I still do not fully understand why that is, but I have now worked out a completely different approach to the problem which (fingers crossed) seems to work in more browsers, although I still cannot vouch for all versions of Internet Explorer.

So here is the chart one more time. I hope it now works for (almost) everyone!

[pvis src=”http://stubbornmule.net/scripts/pv/test.js” img=”/blog/wp-content/PV-CDO-circles.png” height=”125px”]CDO deals: total and recycled[/pvis]
I will also be updating the howto post very shortly to explain my new technique.

UPDATE: at the moment, this trick is not working on mobile devices. It should now be working on mobile devices except for Android. The only remaining problem is IE, but I think that will not be possible. I will instead try to make it fail more gracefully on IE.

* Protovis is a javascript data visualisation library being developed at Stanford, which allows the creation of interactive charts on web pages.

Experimenting with Protovis

A couple of weeks ago I gave a talk on using graphics in R. During the question session, someone asked whether I had tried using Protovis, a javascript data visualisation library being developed at Stanford. It was an easy question to answer: no!

However, a bit of subsequent investigation revealed that Protovis has been developed very much in the spirit of Leland Wilkinson’s book The Grammar of Graphics, which I am currently reading, so I have decided to experiment with it here on the blog.

The charts I generate with R are all static images, while a tool like Protovis allows for user interaction which opens up some interesting possibilities. Compared to R, which I have been using for around 10 years, Protovis presents a double challenge: not only do I have to come to grips with Protovis itself, but I will also have to learn some basic Javascript programming. So, I expect it to be a slow journey.

As a tentative first step, I have reproduced the CDO chart from a recent post ranting about bubble charts. At first glance, it is essentially identical to the chart I produced using R. However, if you hover your mouse over the points on the chart, you should see the figures appear! It is by no means perfect (for example, it would probably look better if single points appeared, rather than every point on the chart and it could do with a legend), but it’s a start and I will persevere.

[pvis src=”http://stubbornmule.net/scripts/pv/test.js” img=”http://stubbornmule.net/blog/wp-content/CDO-circles.png” height=”125px”]CDO deals: total and recycled[/pvis]

Producing scripts using a Javascript library does have its drawbacks. For a start, it means the chart will only be visible when scripts can be run, so if you are reading this in an email or an RSS news reader, you will probably not see very much and will have to visit the page on the blog to see it. Even then, some of you may use script-blockers such as NoScript which will also break the chart (mind you, you can trust the Mule, so you could always whitelist this site!). Finally, I believe that some older browsers (such as IE6) will not support Protovis. It would be useful to see how many people can or cannot see the chart, so please let me know using this poll whether you can see the chart.


(polls)

Getting Protovis to work on the blog was a little fiddly, so for anyone interested, I have also written up a quick guide to using Protovis on a WordPress blog.

UPDATE: Reports in so far indicate that the chart is not working in Google Chrome or on mobile devices. More work to do it would seem!

Junk Charts #4 – Puns are dangerous

Design guru Edward Tufte famously lambasted pie charts in The Visual Display of Quantitative Information and went on to say

the only worse design than a pie chart is several of them

While pie charts do have their defenders, the basis for the contempt in which pie charts are held by Tufte and others is that the human eye is far better at differentiating position and length than angle and area.

Circular CDOsSo, I was a little disappointed when a correspondent drew my attention to this rather bubbly chart which appeared on an article by the excellent team at Pro-Publica (click on the chart to see a larger version).

Pro-Publica is an independent, not-for-profit newsroom that specialises in investigative journalism. They have collaborated with the team at Planet Money (one of my favourite podcasts), and have perhaps delved deeper than any other journalists into the arcane world of CDOs, a topic I have touched on a few times here on the Stubborn Mule.

The chart, attributed to Thetica Systems, was used to accompany an article by Pro-Publica exposing the fact that, in their words,

Over the last two years of the housing bubble, Wall Street bankers perpetrated one of the greatest episodes of self-dealing in financial history.

It is a fascinating story, but it would seem that Thetica’s graphics department was carried away with a visual pun on the title of Pro-Publica’s post “Circular CDOs” when they chose to use circles to depict the growth in CDO recycling from 2005 to 2007. It might look pretty, but the circles make it much harder to discern the trend and to compare the four banks. Pro-Publica’s article deserves better.

In the tradition of my junk chart posts, I have produced an alternative visualization of the same data. I am sure that graphic designers could improve on the colour-scheme, but this simple lattice of line charts makes for a much clearer view of the data.

CDO Self-Dealing (2005-2007)CDO Self-Dealing by investment banks (2005-2007)

If this post has given you a taste for de-junking charts, you should also visit the Junk Charts blog for much, much more.

The Mule goes SURFing

A month ago I posted about “SURF”, the newly-established Sydney R user forum (R being an excellent open-source statistics tool). Shortly after publishing that post, I attended the inaugural forum meeting.

While we waited for attendees to arrive, a few people introduced themselves, explaining why they were interested in R and how much experience they had with the system. I was surprised at the diversity of backgrounds represented: there was someone from the department of immigration, a few from various areas within the health-care industry, a group from the Australian Copyright Council (I think I’ve got that right—it was certainly something to do with copyright), a few from finance, some academics and even someone from the office of state revenue.

Of the 30 or so people who came to the meeting, many classed themselves as beginners when it came to R (although most had experience with other systems, such as SAS). So if there’s anyone out there who was toying with the idea of signing up but hesitated out of concern that they know nothing about R, do not fear. You will not be alone.

The forum organizer, Eugene Dubossarsky, proceeded to give an overview of the recent growth in R’s popularity and also gave a live demo of how quickly and easily you can get R installed and running. Since there were so many beginners, Eugene suggested that a few of the more experienced users could act as mentors to those interested in learning more about R. As someone who has used R for over 10 years, I volunteered my services. So feel free to ask me any and all of your R questions!

As well as being a volunteer mentor, I will have the pleasure of being the presenter at the next forum meeting on the 18th of August. Regular readers of the Stubborn Mule will not be surprised to learn that the topic I have chosen is The Power of Graphics in R. Here’s the overview of what I will be talking about:

In addition to its statistical computing prowess, R is one of the most sophisticated and flexible tools around for visualizing quantitative data. It can produce a wide variety of chart types, including scatter plots, box plots, dot plots, mosaic plots, 3D charts and more. Tweaking chart settings and adding customized annotations is a breeze and the charts can readily be output to a range of formats including images (jpeg or png), PDF and metafile formats.

Topics covered in this talk include:

  • Getting started with graphing in R
  • The basic charting types available
  • Customising charts (labels, axes, colour, annotations and more)
  • Managing different output formats
  • A look at the more advanced charting packages: lattice and ggplot2

Anyone who ever has a need to visualize their data, whether simply for exploration or for producing slick graphics for reports and presentations can benefit from learning to use R’s graphics features. The material presented here will get you well on your way. If you have ever been frustrated when trying to get charts in Excel to behave themselves, you will never look back once you switch to R.

For those of you in Sydney who are interested in a glimpse of how I use R to produce the charts you see here on the blog, feel free to come along. I hope to see you there!

Graphing using R

R Project logoLong-time readers of the Stubborn Mule will know that charts are a regular feature here. Almost all of these charts were produced using the R statistical software package which, in my view, produces far superior results to the most commonly used graphing tool: Excel. As a community service to help rid the world of horrible Excel charts, here is a quick tutorial on charting using R. Since R is a powerful and versatile tool, there is a lot more to it than covered here, so there may be more tutorials to come.

Installing and Running R

The first step is to get R installed on your computer. R is open source and can be downloaded for free from the Comprehensive R Archive Network (CRAN). It comes in many flavours: Mac, Windows and Linux.

Once you have installed R and have fired it up, you are presented with something that looks very different to Excel. This is the first indication that R is an interactive programming environment not a spreadsheet. You will see various messages, including copyright information, some instructions on how to display licence information, how to run a demo, get help, and finally you are presented with a command prompt: “>”. R is now waiting for you to type commands.

As an example, try entering the following command:

getwd()

This will display the current “working directory” (hence “wd”), which is the default folder that R will use for reading and writing data. You can easily change the working directory, either by using the drop-down menus (which menu option varies depending on whether you are using Windows, Mac or Linux) or by using the setwd command:

setwd("/Documents/Mule Docs")

Unless you have a “Mule Docs” folder in a “Documents” folder, you will need to substitute the name of one of your own folders, otherwise you will get an error message. Note that you need to use forward slashes (“/”) rather than backslashes (“\”) even on Windows.

You can see detailed explanations of any R command by prefixing the name of the command with a question mark:

?setwd

This is short for help(setwd). Of course, this assumes you know the name of the command already. To search the documentation for a keyword, use a double question-mark. For example

??median

will show a list of all the commands which feature the word “median” in their documentation. This is short for help.search(“median”). Note the use of double quotes (“) here, not required in the ?? syntax.

Reading Data and Charting

To get started, here is a simple data file in CSV fomat (“comma separated values”). Download it and save it in your working directory (or save it somewhere else and then change R’s working directory to where you just saved the file). You can then load the data into R with the following command:

x <- read.csv("demo.csv")

While the read.csv part is self-explanatory, the “<-” may look a little odd. It is the assignment operator. Whereas most programming languages simply use an “=” to assign to variables, R uses what is intended to look like an arrow. In this case, you should interpret the command as saying “read the contents of the file demo.csv and place the result in the variable x“.  To see the contents of x, you can simply type x at the command line and press return, which will display a table with all the data read from the demo.csv file. When dealing with larger “data frames” (to use the R lingo for this type of object), having that much data flash by may not be very useful. Some other useful commands for quickly inspecting your data are:

head(x)
tail(x)
summary(x)

Now you are ready for your first graph. Try this command:

plot(x)

You should see a simple, clean scatter-plot. If you would prefer a line graph, this is easily done too.

plot(x, type="l")

The plot function has many options, which you can explore in the documentation (just enter ?plot). There are also various commands for further annotations for your chart. Try the following commands:

grid()
axis(side=4)
text(2, -4, "Random Walk")

These will add gridlines, put axis labels on the right-hand sides (R numbers chart sides from 1 to 4 starting from the bottom and working clockwise) and finally displays text on the chart.

Using Program Files

Using R interactively like this is useful for familiarising yourself with the system and for performing quick calculations, but if you find yourself wanting to make small changes here and there, it will quickly become annoying re-typing long commands. This is when you should move to using program files. All that this involves is saving a series of R commands to a file using a text editor (you can just use a simple text editor like Notepad or TextEdit, but many fancier applications can help out by automatically highlighting R commands in different colours, a trick known as “syntax highlighting”). Here is one I prepared earlier: demo.R (by convention, R files are given the .R extension). You can download this and save it into the same folder as the demo.csv file. To execute a program file once you have saved it, you use the source command:

source("demo.R")

This example will also produce a chart of the demo data, but this time it saves the result to an image file (using the Portable Network Graphics image format). This is done using the png command:

png("demo.png", width=400, height=400)

The main parameters for this command are the filename of the image you want to produce and the size of the image. After you execute all of your desired charting commands, you must close off the graphics “device” and save the results, which is done using the following command:

dev.off()

To find out more about graphics “devices” in R, including saving to other file formats (such as PDF or JPEG), have a look at ?Devices.

So that’s it. You are up and running producing charts with R. To go further from here, while you wait for further tutorials, you can explore some of the R files I have used to produce charts for the blog. I store quite a few of them here on github.

Gigabang for your buck

This week Fairfax reported on Australia’s broadband pricing “war” in an article appearing in both the Sydney Morning Herald and the Age. The publisher thoughtfully spared online readers the egregious chart that it foisted on readers of the paper editions. Judging from this junk (to use the official adjective for low-quality charts), these newspapers should stick to journalism and steer clear of graphics.

The chart in question was brought to my attention by Mule Stable regular @zebra, who also kindly scanned it (and devised the headline of this post), allowing me to reproduce it here. It shows the pricing of a number of broadband internet plans offered by the four largest internet service providers (ISPs) in Australia.

Terribly designed chart showing prices vs download limits

Chart from print edition of The Age (29 April 2010)

It is a busy chart, made difficult to read by a number of ill-advised design decisions:

  • the horizontal axis reads from right to left rather than the conventional left to right
  • although labeled “Price vs Download”, price is on the horizontal axis, again violating convention*
  • repeating the ISP label for every point adds unnecessarily to the busy-ness of the chart and it also makes the legend redundant
  • labeling each point with the download limit (although not the price), adds more unnecessary ink

These conventions are arbitrary: we could just as well have developed a tradition in the West of reading from right to left, for example. But once a convention is in place, you have to have a very good reason to break with it. Otherwise, you end up making your chart harder for readers to interpret for no good reason.

But perhaps the biggest weakness in the chart is the labeling of the ISPs. Each has its own colour, but this is not enough for the eye to naturally group them together, which makes it hard to track the pricing trend provider by provider. This is easily addressed by connecting the points for each ISP with lines. Once this is done and the other short-comings are also addressed, a couple of anomalies in the data leap out immediately. Compared to their other plans, the Optus 100GB plan and the TPG 150GB appear dramatically over-priced, costing more than other plans that offer more data.

Improved chart of ISP plans $ v GBImproved version: Price vs Download limit

Of course, this phenomenon was there in the original chart, but it was hidden. So much so, that the journalist does not appear to have noticed at all as it went unremarked in the article. This is a good example of the power of good charting technique.

There are a number of possible explanations for the anomalous data points. They could simply be errors, although it is certainly not impossible (or perhaps even unlikely) that ISPs have illogical pricing policies. A more likely explanation is that the data includes apples and oranges: the higher-priced plans may be bundles offering additional services such as VOIP that are not included in the other more basic plans. Perhaps if Fairfax had done a better job on the chart in the first place, the journalist may have been prompted to answer this question for us.

* Typically “dependent variables” (the y of “y versus x”) appear on the vertical axis and “independent variables” on the horizontal axis.

Pyramid Perversion – More Junk Charts

Food pyramid charts

Knowing the reaction it would elicit, an old friend of mine sent me a link to an article entitled “Shocking Graphic Reveals Why a Big Mac Costs Less Than a Salad”, which featured the chart pictured here. I did indeed find the graphic shocking, but not for the reason the headline writer intended. The graphic in question, taken from the Consumerist which in turn had taken it from Good Medicine*, shows a pair of charts comparing the levels of subsidies different food types receive in the US compared to recommended dietary intake of corresponding food groups. Needless to say, the foods receiving the largest subsidies are the ones that should be consumed in the smallest proportions and the conclusion: no wonder Americans are getting fatter.

The idea that the US government’s agricultural policies appear to be producing decidedly unhealthy outcomes is one I have been reading about in the fascinating book The Omnivore’s Dilemma: A Natural History of Four Meals (its tale of the sex-life of corn alone makes it worth the price) and so this was not what I found shocking about the graphic. What shocked me was the travesty of data visualization used in the graphic: pyramid charts.

It should not be surprising that charts like this are becoming increasingly common since so many charting tools try to lure you into using them. The screenshot below shows the options that the current version of Microsoft Excel offers under the heading “Column” charts. I would argue that everything below “2-D Column” should be banned from the arsenal of the thinking chart-user. These variants on three-dimensional graphics all represent the trap “chart junk”: fancy extra details that, at best, add nothing to the information being conveyed and, at worst, result in distortion. Cones and pyramids fall well into the distortion category.

No doubt echoes of the “food pyramid” trope made the choice of pyramids an irresistible temptation for the Consumerist. The problem is that the data is represented by the height of each segment of the pyramid, but we tend to perceive the apparent volume of each layer. As a result, the layers near the top appear much smaller that they should relative to the lower layers**. This serves to drastically exaggerate how little government funding in the US is directed to fruits, vegetables, nuts and legumes. Using a more prosaic bar chart instead shows that, while the funding of meat and dairy is certainly far greater, the ratios are not as extreme as the pyramid suggests.

US Food Subsidies chart

The bar chart has the added advantage of making it easier to gauge the funding proportion for each category. Also, having each layer stacked one on top of another makes it harder to compare one figure with another. The bar chart eliminates the need for moving the shapes around in your mind in an attempt to make these comparisons. Note how close the funding levels are for grains compared to sugar, oil, starch and alcohol, while the pyramid chart  makes the funding of grains look significantly higher.

The original graphic compensates by quoting each of the figures, but this defeats the purpose of using a chart. If your chart does not make the numbers evident, use a table instead! The extent of the distortion that the pyramids produce is even more apparent in the case of the recommended diet data. While the recommended intake of sugar, oil and salt is certainly low, on the bar chart this category is no longer vanishingly small.

Recommended Diet Chart

Another visualisation alternative would be to use pie charts. While pie charts do have a bad reputation in statistical and scientific circles, and are often used and abused in many a business presentation, they allow more straightforward comparisons of the contributions categories make to the whole. In the pie chart it is much easier to see at a glance that vegetables and fruit should make up about a third of a regular diet, while protein combined with sugar, oil and salt should make up about a quarter. On the other hand, it is harder to use a pie chart to scan numerical values. For that purpose, the bar chart excels (no pun intended). So when choosing a chart to represent data, it is essential to first decide what aspect of the data you are aiming to highlight.

Diet Pie ChartThe pyramid charts were indeed intended to shock, but there was no need for the authors of the post to resort to misleading exaggeration. The figures should be allowed to speak for themselves. Even when using dispassionate bar charts, it remains clear that the US government is funneling a disproportionate amount of money into the types of food Americans are already over-consuming.

You might also be interested in these posts on charts.

* Thanks to Greg for the updated source.
** As a commenter on Lifehacker observed, this distortion would also occur in 2-D triangles, so it’s due to the shape rather than the 3-D nature of the charts. Having said that, the 3-D versions are far more common and indeed Excel only gives the 3-D options.

Junk Charts #3 – US Business Lending

Today’s “Chart of the Day” from Business Insider’s Clusterstock blog presents an alarming picture of the US economy viewed through the prism of bank business lending. The chart, which I have reproduced below, shows a precipitous collapse in lending*, described in dramatic language as “falling like a knife”. There is no doubt that the US economy remains in very poor health, but should we be getting as excited as Clusterstock?

Annual Change in US Commercial and Industrial Loans

Closer examination of the chart reveals that it is in fact quite misleading.

For a start, it makes the very common mistake of plotting a long series of data without adjusting for the fact that over time the value of the dollar has declined through inflation and the US economy has grown. As a result, more recent movements in the data take on an exaggerated scale.

Also, the chart shows annual changes without providing any sense of the base level of lending. Not only that, while attention is drawn to the US $300 billion annual decline in lending, the increase of close to US $300 billion just over a year earlier is ignored, when in fact the two largely offset one another. Certainly lending has declined, but rather than taking us into historically unprecedented territory, as the Clusterstock chart suggests, it actually means loan volumes are back to where they were in late 2007.

Both shortcomings are addressed in the chart below, which shows the history of loan volumes themselves rather than annual changes and overlays a series scaled by the gross domestic product (GDP) of the US to represent lending in “2010 equivalent” dollars.

US Commercial and Industrial Loans

Changes in lending do provide a useful reading of an economy’s health. But, it is important to be careful when using annual changes to read its current state. The change from January 2009 to January 2010 is affected just as much by what happened a year ago as by what happened last month. Since monthly data is available, we can in fact look at changes over a shorter period. The charts below show monthly changes, which are probably a little too volatile, and quarterly changes which are probably the best compromise. Since these charts extend only over a five year period, it is not as important to adjust for changes in the value of the dollar and the size of the economy.

Monthly Changes in US Commercial and Industrial Loans

Quarterly Changes in US Commercial and Industrial Loans

Both of these charts reveal an economy that certainly remains unhealthy and lending volumes are still declining. However, the declines of the last couple of years evidently reflect an unwinding of the enormous increases of a few years earlier. So rather than fretting that lending is “falling like a knife”, we can take some comfort from the fact that the rate of decline is diminishing from the worst point of the third quarter of 2009. The moral of the story is that charts can mislead as easily as words and should always be treated with caution.

* The data is sourced from the St Louis Fed “FRED” economic database.

Which countries work the hardest?

Last week over dinner with friends, a debate arose as to whether Australians worked harder than Americans or not. The case for the affirmative argued that many Australians were very successful overseas and indeed Australians working abroad were highly sought after by employers. The case for the negative drew on experiences working with large US firms which exhibited far more aggressive, high-pressure work-practices than Australian firms.

Since we had more wine than data, the argument did not last very long and we instead moved on to the question of whether China now more closely resembles a fascist regime than a communist one (this debate was quickly mired in definitional issues and became rather animated). Reflecting later on the first discussion, I decided to dig up some data on hours worked and attempt to determine a winner for the debate. According to the OECD, Australia and the United States drew very close in 1979 when workers in both countries put in an average of 35 hours per week. But apart from that, over the last forty years US workers have fairly consistently worked an average of 1 to 1.5 hours more each week than Australian workers.

Australia/US Hours Worked

Total Hours Worked per head of Workforce (1950-2008)

And what of the rest of the world? Among the countries covered in the 2008 OECD data, Korea* was by far the most industrious country. Employed Koreans laboured an average of 44.5 hours each week. From there, hours worked fell quickly to Greece on 40.8 hours and then down to the Czech Republic on 38.3 hours. Australia and the United States are in a tightly packed group, ranging from Iceland in seventh place overall on 34.8 hours per week down to Australia in 16th place on 33.1 hours per week. The United States is towards the top of this group, working an average of 34.5 hours and sitting in ninth place overall. The Hanseatic League is not what it once was as Germany, Norway and the Netherlands are clustered at the bottom of the league table, all putting in around 27 hours of work each week.

Hours Worked 2008 National Ranking of Hours Worked in 2008*

One shortcoming of these figures is that they do not give an indication of the total effort contributed to each country. This is because the averages are calculated per head of the workforce and ignores children, the unemployed, the sick and the retired. It is conceivable that in countries with fewer workers, those workers may have to work harder to support everyone else. Indeed, recalibrating the numbers based on total hours worked per head of the total population does change the rankings somewhat. Korea still puts in a good showing, but surrenders first place to Luxembourg. Australia climbs a few places to 11th place and in the process pulls one place ahead of the United States, reflecting in part the higher unemployment rate in the United States. Coming in last place is France, which puts in an average of only 13.5 hours of labour per capita.

Hours by Workforce and PopulationTwo Measures of Hours Worked in 2008*

But is this data enough to resolve the debate? Unfortunately not. There are too many things that this kind of broad data does not capture. For instance, underemployment is a significant concern in many countries, including Australia and the United States. If there are many people not working as many hours as they would like to, actual hours worked may not be a good indication of the relative industriousness of different countries. Segmentation is another problem. Before our dinner-table debate moved on to China, speculation arose about possible differences in work patterns in US firms based in large cities on the East and West coasts compared to workplaces around the rest of the country. Again, aggregate statistics cannot capture any such differences.

So next time this particular group of friends meets, I will have some data to bring to the table, but not enough to carry the argument.

* Only 2007 data is available for Korea. All other data is for 2008.

Deceptive Charts #2

Last month I wrote about the dangers of secondary axes, but even charts with a single axis can be deceiving. I have been reflecting on this after reading Jon Peltier’s critique of Microsoft’s “professional” charting tutorials earlier this week. One of the charts Peltier takes issue with is a column chart which has the value axis starting at 100 rather than zero. He writes:

This is a major chart fail. The value axis on a column or bar chart should always include zero. Always. If you want to expand the scale to help resolve the values, then a column chart is not the right chart type.

Bar Chart - Bad
Median Income of Readers – Silicon Alley Version

This chart may do a good job of highlighting the Wall Street Journal leading position, thereby supporting Silicon Alley’s headline “The Journal Has The Richest Readership Among Print Pubs”. But it also gives a distorted impression of just how solid the Journal’s lead is. Starting the income axis at zero, shown in the chart below, gives a rather different impression. The Wall Street Journal still sits at the top, but the variation across the titles is much less significant than the original chart suggested.Bar Chart - Good(ish)

Median Income of Readers – Zero-based Version

Nevertheless, precisely because it displays less variation in the data, the zero-based chart does seem less useful and it is harder to read the values. Commenting on Peltier’s post and musing on my posterous Extras blog, I wondered whether starting axes with zero should be considered an inviolable rule of charting. One of the gurus of data visualisation is William S. Cleveland. In his book “The Elements of Graphing Data” he gives this advice: “Do not insist that zero always be included on a scale showing magnitude”. He goes on to make this argument:

For graphical communication in science and technology assume the viewer will look at the tick mark labels and understand them. Were we not able to make this assumption, graphical communication would be far less useful. If zero can be included on a scale without wasting undue space, then it is reasonable to include it, but never at the expense of resolution.

At first glance this would seem to get the Silicon Alley Insider out of chart jail. But the story does not end there. Cleveland’s book focuses on scientific charts, particularly line and scatterplots (also known as X-Y plots) and there is scarcely a bar or column chart to be found. Furthermore, in his paper “Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging” (The American Statistician, November 1984), he makes the following observations:

The bar of a bar chart has two aspects that can be used to visually decode quantitative information—size (length and area) and the relative position of the end of the bar along the common scale. The changing sizes of the bars is an important and imposing visual factor; thus it is important that size encode something meaningful. The sizes of bars encode the magnitudes of deviations from the baseline. If the deviations have no important interpretation, the changing sizes are wasted energy and even have the potential to mislead (Schmid 1983).

Cleveland’s solution to showing data variation without having bar lengths deceive was to invent a new type of chart: the “dot plot”. Dot plots, which I have used here on the Stubborn Mule to illustrate statistics on asylum-seekers and universities, use position alone to encode the data. This means that it is much safer to drop zero from the axis. Although rather tricky to produce using Microsoft Excel (I use the R package), they are a good substitute for bar and column charts. This BeyeNetwork article goes into more detail about dot plots, including the use of multi-panel plots, which I will look at in a future post.

So, here is a dot plot version of the newspaper and magazine rankings by reader income.
Papers - dot plot
Median Income of Readers – Dot Plot Version

Now you know to be vigilant against the deceptive use of axis scales.