R

ngramr – an R package for Google Ngrams

16 July 2013

The recent post How common are common words? made use of unusually explicit language for the Stubborn Mule. As expected, a number of email subscribers reported that the post fell foul of their email filters. Here I will return to the topic of n-grams, while keeping the language cleaner, and describe the R package I […]

27 comments Read the full article →

What is Tony talking about?

17 September 2012

I first experimented with word clouds several years ago and used them to visualise the speeches of Kevin Rudd and Malcolm Turnbull. I have now learned from the Fell Stats blog (via R-Bloggers) that there is an R package for generating word clouds.  The package makes use of tm, a text mining package for R, which I have been […]

8 comments Read the full article →

Benford’s Law

16 April 2012

Here is a quick quiz. If you visit the Wikipedia page List of countries by GDP, you will find three lists ranking the countries of the world in terms of their Gross Domestic Product (GDP), each list corresponding to a different source of the data. If you pick the list according to the CIA (let’s […]

24 comments Read the full article →

Hottest 100 for 2011

26 January 2012

Another year, another Australia Day. Another Australia Day, another Triple J Hottest 100. And that, of course, means an excellent excuse to  set R to work on the chart data. For those outside Australia, the Hottest 100 is a chart of the most popular songs of the previous year, as voted by the listeners of […]

3 comments Read the full article →

More colour wheels

6 November 2011

In response to my post about colour wheels, I received a suggested enhancement from Drew. The idea is to first match colours based on the text provided and then add nearby colours. This can be done by ordering colours in terms of hue, saturation, and value. The result is a significant improvement and it will capture all of […]

2 comments Read the full article →

Colour wheels in R

5 November 2011

Regular readers will know I use the R package to produce most of the charts that appear here on the blog. Being more quantitative than artistic, I find choosing colours for the charts to be one of the trickiest tasks when designing a chart, particularly as R has so many colours to choose from. In […]

11 comments Read the full article →

A gentle introduction to R

31 January 2011

Whenever a post on this blog requires some data analysis and perhaps a chart or two, my tool of choice is the versatile statistical programming package R. Developed as an open-source implementation of an engine for the S programming language, R is therefore free. Since commercial mathematical packages can costs thousands of dollars, this alone […]

2 comments Read the full article →

Generate your own Risk Characterization Theatre

25 October 2010

In the recent posts Visualizing Smoking Risk and Shades of grey I wrote about the use of “Risk Characterization Theatres” (RCTs) to communicate probabilities. I found the idea in the book The Illusion of Certainty, by Eric Rifkin and Edward Bouwer. Here is how they explain the RCTs: Most of us are familiar with the crowd in a […]

1 comment Read the full article →

The Mule goes SURFing

30 July 2010

A month ago I posted about “SURF”, the newly-established Sydney R user forum (R being an excellent open-source statistics tool). Shortly after publishing that post, I attended the inaugural forum meeting. While we waited for attendees to arrive, a few people introduced themselves, explaining why they were interested in R and how much experience they […]

3 comments Read the full article →

Surf

25 June 2010

A new R user group has launched in Sydney. It aims to bring together both experienced R users and complete beginners. The forum will meet monthly with talks on a wide range of subjects exploring all of the facets of this powerful tool.

0 comments Read the full article →