Whenever a post on this blog requires some data analysis and perhaps a chart or two, my tool of choice is the versatile statistical programming package R. Developed as an open-source implementation of an engine for the S programming language, R is therefore free. Since commercial mathematical packages can costs thousands of dollars, this alone makes R worth investigating. But what makes R particularly powerful is the large and growing array of specialised packages. For any statistical problem you come across, the chances are that someone has written a package that will make the problem much easier to get to grips with.
If it was not already clear, I am something of an R evangelist and I am not the only one. The growing membership of the Sydney Users of R Forum (SURF) suggests that we are getting some traction and there are a lot of people interested in learning more about R.
Sooner or later, every R beginner will come across An Introduction to R, which appears as the first link under Manuals on the R website. If you work your way through this introduction, you will get a good grounding in the essentials for using R. Unfortunately, it is very dry and it can be a challenge to get through. I certainly never managed to read it from start to finish in one sitting, but having used R for more than 10 years, I regularly return to read bits and pieces, so by now I have read and re-read it all many times. So, useful though this introduction is, it is not always a great place to start for R beginners.
There are many books available about R, including books focusing on the language itself, books on graphics in R, books on implementing particular statistical techniques in R and more than one introduction to R. A few weeks ago I was offered an electronic review copy of Statistical Analysis With R, a new beginner’s introduction to R by John M. Quick. Curious to see whether it could offer a good springboard into R, I decided to take up the offer.
At around 300 pages and covering a little less ground, it certainly takes a more leisurely pace than An Introduction to R. It also attempts a more engaging style by building a narrative around the premise that you have become a strategist for the Shu army in 3rd century China. The worked examples are all built around the challenge of looking at past battle statistics to determine the best strategy for a campaign against the rival Wei kingdom. Given how hard it can be to make an introduction to a statistical programming language exciting, it is certainly worth trying a novel approach. Still, some readers may find the Shu theme a little corny.
The book begins with instructions for downloading and installing R and goes on to explore the basics of importing and manipulating data, statistical exploration of the data (means, standard deviations and correlations), linear regression and finishes with a couple of chapters on producing and customising charts. This is a good selection of topics: mastery of these will provide beginners good grounding in the core capabilities of R. Readers with limited experience with statistics may be reassured that no assumptions are made about mathematical knowledge. The exploration of the battle data is used to provide a simple explanation of what linear regression is as well as the techniques available in R to perform the computations. While this approach certainly makes the book accessible to a broader audience, it is not without risks. Statistical tools are notorious for being abused by people who do not understand them properly. As a friend of mine likes to say, “drive-by regressions” can do a lot of damage!
Each chapter adopts the same structure: a brief introduction advancing the Shu story; a list of the topics covered in the chapter; a series of worked examples with sample commands to be entered into the R console followed by an explanatory “What just happened?” section and a “Pop quiz”; suggestions for further tasks for the readers to try; and finally a chapter summary. At times this approach feels a little repetitive (and the recurring heading “Have a go hero” for the suggested further tasks section may sound a little sarcastic to Australian readers at least), but it is thorough.
If I were to write my own introduction to R (one day perhaps?), I would do some things a little differently. I would try to explain a bit more about the semantics of the language, particularly the difference amongst the various data types (vectors, lists, data frames and so on). But perhaps that would just end up being as dry as An Introduction to R. Also, though I certainly agree with Quick that commenting your code is a very important discipline (even if no-one else ever reads it, you might have to read it again yourself!), I do think that he takes this principle too far in expecting readers to type all of the comments in the worked examples into the console!
Statistical Analysis With R is a very gentle introduction to R. If you have no prior experience of R, reading this book will certainly get you started. On the other hand, if you have already started experimenting with R, the pace may just be a little too slow.