Long-time readers of the Stubborn Mule will know that charts are a regular feature here. Almost all of these charts were produced using the R statistical software package which, in my view, produces far superior results to the most commonly used graphing tool: Excel. As a community service to help rid the world of horrible Excel charts, here is a quick tutorial on charting using R. Since R is a powerful and versatile tool, there is a lot more to it than covered here, so there may be more tutorials to come.
Installing and Running R
The first step is to get R installed on your computer. R is open source and can be downloaded for free from the Comprehensive R Archive Network (CRAN). It comes in many flavours: Mac, Windows and Linux.
Once you have installed R and have fired it up, you are presented with something that looks very different to Excel. This is the first indication that R is an interactive programming environment not a spreadsheet. You will see various messages, including copyright information, some instructions on how to display licence information, how to run a demo, get help, and finally you are presented with a command prompt: “>”. R is now waiting for you to type commands.
As an example, try entering the following command:
This will display the current “working directory” (hence “wd”), which is the default folder that R will use for reading and writing data. You can easily change the working directory, either by using the drop-down menus (which menu option varies depending on whether you are using Windows, Mac or Linux) or by using the setwd command:
Unless you have a “Mule Docs” folder in a “Documents” folder, you will need to substitute the name of one of your own folders, otherwise you will get an error message. Note that you need to use forward slashes (“/”) rather than backslashes (“\”) even on Windows.
You can see detailed explanations of any R command by prefixing the name of the command with a question mark:
This is short for help(setwd). Of course, this assumes you know the name of the command already. To search the documentation for a keyword, use a double question-mark. For example
will show a list of all the commands which feature the word “median” in their documentation. This is short for help.search(“median”). Note the use of double quotes (“) here, not required in the ?? syntax.
Reading Data and Charting
To get started, here is a simple data file in CSV fomat (“comma separated values”). Download it and save it in your working directory (or save it somewhere else and then change R’s working directory to where you just saved the file). You can then load the data into R with the following command:
x <- read.csv("demo.csv")
While the read.csv part is self-explanatory, the “<-” may look a little odd. It is the assignment operator. Whereas most programming languages simply use an “=” to assign to variables, R uses what is intended to look like an arrow. In this case, you should interpret the command as saying “read the contents of the file demo.csv and place the result in the variable x“. To see the contents of x, you can simply type x at the command line and press return, which will display a table with all the data read from the demo.csv file. When dealing with larger “data frames” (to use the R lingo for this type of object), having that much data flash by may not be very useful. Some other useful commands for quickly inspecting your data are:
head(x) tail(x) summary(x)
Now you are ready for your first graph. Try this command:
You should see a simple, clean scatter-plot. If you would prefer a line graph, this is easily done too.
The plot function has many options, which you can explore in the documentation (just enter ?plot). There are also various commands for further annotations for your chart. Try the following commands:
grid() axis(side=4) text(2, -4, "Random Walk")
These will add gridlines, put axis labels on the right-hand sides (R numbers chart sides from 1 to 4 starting from the bottom and working clockwise) and finally displays text on the chart.
Using Program Files
Using R interactively like this is useful for familiarising yourself with the system and for performing quick calculations, but if you find yourself wanting to make small changes here and there, it will quickly become annoying re-typing long commands. This is when you should move to using program files. All that this involves is saving a series of R commands to a file using a text editor (you can just use a simple text editor like Notepad or TextEdit, but many fancier applications can help out by automatically highlighting R commands in different colours, a trick known as “syntax highlighting”). Here is one I prepared earlier: demo.R (by convention, R files are given the .R extension). You can download this and save it into the same folder as the demo.csv file. To execute a program file once you have saved it, you use the source command:
This example will also produce a chart of the demo data, but this time it saves the result to an image file (using the Portable Network Graphics image format). This is done using the png command:
png("demo.png", width=400, height=400)
The main parameters for this command are the filename of the image you want to produce and the size of the image. After you execute all of your desired charting commands, you must close off the graphics “device” and save the results, which is done using the following command:
To find out more about graphics “devices” in R, including saving to other file formats (such as PDF or JPEG), have a look at ?Devices.
So that’s it. You are up and running producing charts with R. To go further from here, while you wait for further tutorials, you can explore some of the R files I have used to produce charts for the blog. I store quite a few of them here on github.
Possibly Related Posts (automatically generated):
- Getting Protovis working on WordPress (6 September 2010)
- BitTorrent Sync (8 June 2013)
- Generate your own Risk Characterization Theatre (25 October 2010)
- Dropbox (29 October 2008)
Thanks mule, good newb intro for hacks like myself. Finally have some spare time to play around with R (soon), its looks pretty cool, although not sure it can compete with Matlab…
AJ: I once heard someone say that R is a package for statisticians with some mathematical functionality thrown in and Matlab is a package for mathematicians with some statistical functionality thrown in and that’s probably not a bad summary. Of course, the big advantage R has is that it is free (and, unlike so much that is free, it is actually useful). Once you are prepared to spend money on a package (or, perhaps more likely in practice, someone else’s money), I would throw Mathematica into the mix as another package that is vastly more powerful than R. In the meantime, R is great for the penny-pinching enthusiast.
AJ: Here’s a quick and dirty comparison of R, Matlab and a few others.