Hottest 100 for 2011

by Stubborn Mule on 26 January 2012 · 3 comments

Another year, another Australia Day. Another Australia Day, another Triple J Hottest 100. And that, of course, means an excellent excuse to  set R to work on the chart data.

For those outside Australia, the Hottest 100 is a chart of the most popular songs of the previous year, as voted by the listeners of the radio station Triple J. The tradition began in 1991, but initially people voted for their favourite song of all time. From 1993 onwards, the poll took its current form* and was restricted to tracks released in the year in question.

Since the Hottest 100 Wikipedia pages include country of origin**, I thought I would see whether there is any pattern in whose music Australians like best. Since it is Australia Day, it is only appropriate that we are partial to Australian artists and they typically make up close to half of the 100 entries. Interestingly, in the early 90s, Australian artists did not do so well. The United Kingdom has put in a good showing over the last two years, pulling ahead of the United States. Beyond the big three, Australia, UK and US, the pickings get slim very quickly, so I have only included Canada and New Zealand in the chart below.

Number of Hottest 100 tracks by Country

If you have excellent eyesight, you may notice that 2010 is missing from the chart. For some reason, this is the only year which does not include the full chart listing on the Wikipedia page. There is a link to a list on the ABC website, but unfortunately it does not include the country of origin. Maybe a keen Wikipedian reading this post will help by updating the page.

I make no great claims for the sophistication or the insight of this analysis: it was really an excuse to learn about using the XML package for R to pull data from tables in web pages.

require(XML)
require(ggplot2)
require(reshape2)

results <- data.frame()
col.names <- c("year", "rank", "title", "artist", "country")

# Skip 2010: full list is missing from Wikipedia page
years <- c(1993:2009, 2011)

for (year in years) {
    base.url <- "http://en.wikipedia.org/wiki/Triple_J_Hottest_100,"
    year.url <- paste(base.url, year, sep="_")
    tables <- readHTMLTable(year.url, stringsAsFactor=FALSE)
    table.len <- sapply(tables, length)
    hot <- cbind(year=year, tables[[which(table.len==4)]])
    names(hot) <- col.names
    results <- rbind(results, hot)
}

# Remap a few countries
results$country[results$country=="Australia [1]"] <- "Australia"
results$country[results$country=="England"] <- "United Kingdom"
results$country[results$country=="Scotland"] <- "United Kingdom"
results$country[results$country=="Wales"] <- "United Kingdom"
results$country[results$country=="England, Wales"] <-"United Kingdom"

# Countries to plot
top5 <- c("Australia", "United States", "United Kingdom",
  "Canada", "New Zealand")

# Create a colourful ggplot chart
plt <- ggplot(subset(results, country %in% top5),
    aes(factor(year), fill=factor(country)))
plt <- plt + geom_bar() + facet_grid(country ~ .)
plt <- plt + labs(x="", y="") + opts(legend.position = "none")

Created by Pretty R at inside-R.org

UPDATE: there is a little bit more analysis in this follow-up post.

* Since the shift to single year charts, there have been two all-time Hottest 100s: 1998 and 2009.

** There are some country combinations, such as “Australia/England”, but the numbers are so small I have simply excluded them from the analysis.

Possibly Related Posts (automatically generated):

{ 2 comments… read them below or add one }

1 Rob Syme January 27, 2012 at 1:58 pm

The Hottest 100 list for 2010 is not up on Wikipedia due to the lack of a response from the list publisher. Unfortunately, the WP solicitors have advised that we can’t publish the list until ABC gives explicit permision. I’ll call up the doctor or Linda Marigliano this afternoon and see if we can get some ABC lawyers to give it the OK.

2 Stubborn Mule January 27, 2012 at 5:05 pm

Interesting, thanks Rob! Does that mean WP has permission for the other years (including 2011), or is there a bit of risk being run there?

Leave a Comment

{ 1 trackback }

Previous post:

Next post: