Tag Archives: data visualization

Where Have the Fish Come From?

After reading my posts on the international arms trade, a friend thought I might be interested in some data on the international trade in fish. While I know almost as little about fish as about arms, I always welcome good data. The data in question is published by the Food and Agriculture Organization (FAO) of the United Nations. The FAO also hosts FAOStat, which looks like an interesting data repository. If I can get myself a subscription to this service, it may provide the subject matter for future posts on the Mule.

But back to the fish. The first point my correspondent made was that many fish exporters are also importers. Among the top 50 importers of fish, all but 16 countries also appear in the list of the top 50 exporters. The chart below* gives an indication of the relative scale of fish imports and exports in 2006 of the top 10 importing countries. Of these big importers, only China and Denmark export even more fish than they import.
Fish Imports and Exports

Fish Trade by the Top 10 Importers (2006)

But the real mystery my fishy correspondent alerted me to is the difference between total worldwide imports and exports of fish. According to the figures, total worldwide imports of fish amounted to US $89.6 billion while exports only amounted to US $85.9 billion. That would appear to mean that US $3.7 billion worth of fish was imported in 2006 from nowhere! While I am sure that statistics of this kind may not be too accurate, the report does report each country’s trade figures to the nearest US $1000, so it seems to be a big difference. I speculated that some countries were not admitting to exporting whale meat to Japan, but my correspondent pointed out that whales are not fish. While the US Supreme court has ruled that tomatoes are vegetables, I do not know their view on whales, and this is probably not the answer anyway. Any theories out there, readers?

At the suggestion of singingfish, I will be making available the code used to produce charts here on the Stubborn Mule. Most of the charts are produced using the R statistical package, which is free and open-source. R can be downloaded here. The data and code for the chart above is here. I will gradually add the code for charts from older posts as well.

UPDATE: I forgot to mention that my correspondent also suggested fish rain as an explanation. I, however, am not convinced. Regardless of the original source, I am sure most countries would treat fish rain as a natural bounty rather than an import.

* Tip for reading the chart: there is no label on the right hand side for the USA and no label on the left for Denmark, but following the lines should make it obvious where they would be if there was room.

The Big Arms Traders

My last post looked at the international arms trade. Taking data from SIPRI, I produced maps showing arms exports for a number of countries, including Australia and the USA. While these maps gave an indication of the spread of arms trading, it did not show which are the biggest overall importers and exporters of arms.

To remedy this, I have created two “word clouds”. The first shows arms importers. The size of the text varies with the total value of arms imported over the period 1980 to 2008 (figures are adjusted for inflation and are expressed in 1990 US dollars). The three biggest arms importers over this period were India ($58 billion), Japan ($37 billion) and Saudi Arabia ($35 billion). Australia’s imports over this period totaled $15 billion.

Arms Import Cloud

Arms Importers (1980-2008)

The word cloud for exporters is far more concentrated. Between them the USA and Russia* accounted for almost 65% of total arms exports, with exports of $60 billion and $48 billion respectively. France then comes in at a distant third with exports totaling just under $12 billion.

Arms Imports Cloud

Arms Exporters (1980-2008)

If you like the look of these word clouds, you can easily create your own. With Wordle you can create word clouds which are based on word frequency. This example is based on words used here on the Stubborn Mule (notice the prominent appearance of the word “debt”). For a bit more flexibility, IBM have a freely available Word-Cloud Generator, which can either work on word frequencies or take columns of words and numbers. It is written in java and is very easy to configure and run. I used it to produce the images in this post.

* As in the previous post, figures for the USSR and Russia have been aggregated.

Love is Old-Fashioned, Sex Less So

Following on from my post on Visualizing the Hottest 100, I noticed that the UK’s Guardian newspaper has published a list of 1000 songs to hear before you die*. The list was assembled from nominations posted by readers. Even before looking at the list, I suspected that the demographic profile of the Guardian’s readers may be a little different to that of Triple J’s listeners. A look at the distribution of year of release in the two lists bears that out.

Hottest 100 Guardian 1000
Minimum 1965 1916
1st quartile 1984 1968
Median 1994 1977
3rd quartile 1997 1988
Maximum 2008 2008

Year of Release “Five Number” Statistics

In fact, fully 14% of the tracks in the Guardian’s list were released before the earliest track in the Hottest 100. Interestingly, that track was Bob Dylan’s “Like A Rolling Stone”, which also features in the Guardian’s list.

While the 1000 songs are not presented in any particular rank order, they are grouped by “theme”. The themes are heartbreak, life and death, love, party sonds, people and places, politics and protest and, of course, sex. This allows us to investigate the evolution over time of these different themes.

The chart below is a “box and whisker plot”, also known more prosaically as a “box plot”. It provides a graphical representation of the distribution over songs in each theme by year of release. The box shows the “interquartile range”, from the 1st quartile to the 3rd quartile. This means that half the songs fall inside the box, while a quarter were released in earlier years and a quarter in later years. The solid band shows the median year, which is the year right in the middle of the distribution. The light grey line shows the average year of release. Since most of the distributions are skewed to the left (early years) right (later years) in the interquartile range [see UPDATE below], the mean is a bit higher than the median. The “whiskers” on the plot extend no more than 1.5 times the width of the box. Any outliers beyond the whiskers are shown as points.

Box Plot (II)

Distribution of Year of Release

So what can be made of these distributions? It looks as though love songs are not as popular as they once were and people and places have fared worse still. But while love may be old-fashioned, sex and party songs have become more prevalent and there is still plenty of heartbreak.

And what of the most popular artists? The three most successful artists in Triple J’s Hottest 100 were Nirvana, Jeff Buckley and Radiohead. Nirvana and Radiohead managed one song each in the Guardian’s list: “Lithium” and “Paranoid Android” respectively (both in the life and death theme). Jeff did not make the list, although his father Tim did, with the song “On Top”. The artist with the most entries in the Guardian’s list was Bob Dylan, and the top 12 features a few who did not make it into the Hottest 100 at all, including Randy Newman, Frank Sinatra and The Kinks.

Bob Dylan 24
The Beatles 19
David Bowie 9
Randy Newman 8
The Rolling Stones 8
Elvis Presley 6
Frank Sinatra 6
Madonna 6
Marvin Gaye 6
Prince 6
The Beach Boys 6
The Kinks 6

It’s hard to read much more than that into these numbers, but importantly it gave me the opportunity to use a box and whisker plot which this blog has been sorely lacking.

UPDATE: As Mark has commented, this is a bit of a dodgy explanation. There is only so much that can be deduced about a distribution from a box and whisker plot (appealing though they may be). This histogram shows the distribution of the year of release for life and death songs.

Histogram: Life and Death Year of Release

Life and Death Theme Histogram

Mark also pointed out that the box and whisker plot does not really show the relative popularity of the different themes over time. I haven’t used pie charts yet, but I am not a fan, so I have come up with a mosaic plot instead.

Mosaic (II)

This confirms the decline in popularity of the love theme, but suggests that, while sex boomed in the 1990s, it has lost ground again in the 21st century. Heartbreak and party songs are the most popular themes of the current decade. The chart also shows that there are more songs in the list from the 60s and 70s than from the 90s, again a departure from the Hottest 100.

I have added this chart to the Guardian Datastore photo pool on flickr.

* To be precise, there are only 988 different songs in the list (and six are duplicated, each appearing in two different categories).

Olympic Medals per Capita – Update

Since my last post, about Beijing 2008 Olympic rankings by population and economy size, there has been a lot of action in the medals per capita stakes. The Bahamas knocked Jamaica from the number one spot with a Bronze in the triple-jump, only to have Jamaica regain the crown as it continued to win Gold in track and field. Then, with a Silver in the Men’s 4 x 400m relay, the Bahamas got to the front again in what is now an unassailable lead.

For the blow-by-blow on MPC, visit the LA Times MPC blog. I can’t help mentioning that Australia has now pulled ahead of New Zealand!

Previously, the charts I used were static, unable to keep up with these rapid changes so, although the Games are drawing to a close now, I thought I would include Swivel charts which will update as the last results come through. This time I am showing rankings in terms of a simple total medal count per million of population (previously I used a points system, 3 points for Gold, etc).
Beijing Olympics 2008: Medals per mil. Population by Country
Continue reading

Olympic Medal Count by Population and GDP

Now that the swimming is over, Australia is likely to see its rankings in the Olympic medal tally start to fall. To feel better about this situation, people like to start pointing out that we still look pretty good for a small country and it’s certainly true that of the countries currently in the medal tally (as at 22 August 2008), we rank only 36th in terms of population. Ever since I blogged about the data-sharing site  Swivel, I have been regularly updating a data-set with the medal tally. So, it was a simple matter to add in population as well. The chart below provides a high-level overview of the medal results by population. It shows both the total number of medals won and the gold medals. The further a country sits in this chart above a 45 degree line, the better it is doing by population.

Total Medals (blue) and Gold medals by Population

Continue reading

Digging into GroceryCHOICE

Earlier this week, South Australian senator Nick Xenophon raised concerns that the Government’s FuelWatch scheme would lead to higher petrol prices and that small independent petrol retailers were likely to be disadvantaged by the scheme. So it looks likely that the FuelWatch legislation will fail to pass the senate and then fade into oblivion. I can’t say I’m too upset about this as I have been critical of the scheme. Furthermore, falling oil prices have led to a fall of around 20 cents/litre in petrol prices which takes much of the sting out of the issue.

So now I am free to turn my attention to another Australian Government initiative, GroceryCHOICE**. This scheme aims to “[help] consumers find the cheapest supermarket chain in their area without having to compare hundreds of prices”. Every month a survey is conducted of prices on around 500 different grocery items at over 600 supermarkets around the region. These prices are aggregated into “baskets” of goods in the following categories:
Continue reading

Online Data and Charts with Swivel

I recently came across the OECD Factbook blog written by Jérôme Cukier, who works as a data editor for the OECD. He has an excellent post on publishing charts in blogs.

As regular readers of the Mule will know, I don’t mind posting the odd chart and in the process I have grappled with the less than ideal results that the Excel to image production-cycle can produce. Jérôme’s process discusses these challenges and illustrates the results of different techniques (although I had more luck with copying as a picture and saving to PNG format than he had, so perhaps the choice of picture editor is a factor as well). As far as possible, I try to avoid using Excel altogether for producing charts and instead use the statistical package R, which can produce charts directly to a number of image formats including JPG and PNG. Although Jérôme doesn’t mention R, it does crop up in the first of the comments on his post.

Continue reading

Drivers of Australian Inflation

Inflation in Australia has been running well outside the 2-3% range targeted by the Reserve Bank of Australia—the most recent figure was 4.3% for the 12 months to March 2008—which is why interest rates have been on the rise for the last couple of years. So what has been driving prices up in Australia? One useful way to get a sense of what has been happening is to use a type of chart known as a treemap (sometimes called a “Map of the Market”). These charts tend to be pretty busy, but can be a great way to explore a rich set of data.

Continue reading