“FRED” is the St.Louis Federal Reserve Economic Database. It is an excellent repository of economic data, currently boasting 45,000 time-series from 42 data sources. The web-site offers a powerful interface for creating charts of FRED data. Unfortunately, it is a little too powerful, offering a rather dangerous feature: the secondary axis.
I have railed against secondary axes before. They tend to lure the viewer into seeing spurious correlations. Experimenting with FRED, Business Insider has fallen into exactly that trap. In an article entitled “PRESENTING: the ultimate oil currency“, Joe Weisenthal concludes that the euro is surprisingly highly correlated with the price of oil, particularly when oil prices are denominated in gold (OIL.XAU). His evidence is a chart created in FRED (courtesy of the site’s data transformation feature, which allows you to divide the Oil price in US dollars by the price of gold in US dollars).
Wiesenthal goes on to produce similar charts for the Australian dollar (AUD) and the Canadian dollar (CAD), concluding that they do not track the oil price nearly so well. With superimposed time-series like this, the eye is all too easily fooled into seeing correlations which do not exist. Simply separating the lines goes a long way to dispelling this illusion, as the charts below illustrate.
Looking at these charts, the strongest conclusion you would draw is that the euro and the oil price both went up in 2008, with the caveat that the euro started its run somewhat earlier, and the fell again towards the end of the year. At least you would probably agree with Wiesenthal that the Australian and Canadian dollars do not track the price of oil.
Rather than using two axes when comparing financial price histories, it is better to scale both series to a common value (say 100) at an initial point and plot the results against a single axis. Doing this for the euro and the price of oil shows that the rise in oil prices in mid 2008 was far sharper than that of the euro, as was the fall towards the end of the year.
If that chart is not enough to convince you that Wiesenthal’s euro/oil correlation is overblown, perhaps some statistics will help. The absolute price level of the time series is not important. What we need to measure is the correlation of returns (i.e. the percentage change in the prices)*. Daily returns might be a bit noisy, masking any correlations lurking in the data, so I have also calculated correlations for returns over a week (5 trading days) and a month (roughly 20 trading days).
1 day Returns | 5 day Returns | 20 day Returns | |
---|---|---|---|
AUD | 35% | 35% | 47% |
CAD | -35% | -34% | -46% |
EUR | 20% | 15% | 27% |
Correlation of Returns to OIL.XAU
The correlation between the euro and the oil price is unimpressive, only reaching 27% for monthly returns. Perhaps surprisingly, it is the Australian dollar that shows the highest correlation to oil. Then again, that is probably only surprising after looking at Wiesenthal’s chart. After all, the Australian dollar is known as a “commodity currency”. But even for the Australian dollar, a 47% monthly return correlation for is not very high.
Once again, the lesson here is to beware of secondary axes. If I was running the FRED site, I would ban the feature immediately.
* The problems with computing correlations between serially correlated time series, such as price data, are well known. See for example Granger and Newbold, “Spurious Regressions in Econometrics” (1974).
Possibly Related Posts (automatically generated):
- Junk Charts: Secondary Axes (6 October 2009)
- Collapsing Oil Prices (22 October 2008)
- Petrol Price Update (21 October 2009)
- Weak Dollar and Australian Petrol Prices (8 September 2008)
{ 12 comments… read them below or add one }
Stubby,
Good point.
Just one question, related to this: “Rather than using two axes when comparing financial price histories, it is better to scale both series to a common value (say 100) at an initial point and plot the results against a single axis.”
Wouldn’t it be even better to simply draw a scatter plot?
A scatter plot is the best chart if you are plotting the returns for each series rather than the original price series.
EPIC TAKEDOWN. Kudos.
Yup!
Plotting the two times series on the same axis by scaling is also deceptive. Correlation has nothing to do with volatility but you use the relative volatilities of the two series as evidence for lack of correlation. Or at least lack of evidence for correlation. By using seperate axes the volatility differences are scaled out.
I would also be careful about using different windows for the correlation calculation. For time series, without auto-correlation, the expected value of the 10 day correlation is the same as the expected value of the 1 day correlation. The “noisiness” of the shorter window series divides out when calculating the correlation from the covariance and volatilities. Even with simple auto-correlation they don’t change. It is a common misconception that choosing a longer window gives a better correlation estimate as the longer window returns are less “noisy”, when it is really only, hard to model and quantify, day effects that are important.
I personally wouldn’t reject the hypothesis at 95% confidence level that all 9 correlations are “the same”. The error on the 20 day estimate over 7 years is about 10% due to the smaller number of degrees of freedom (about 100) compared to the 1 day estimate (about 2000).
@Zebra: I prefer the small multiples to the indexed plot as a device to comparing what the levels of the two series are doing. You are, of course, correct that volatility is different from correlation, but the a chart with a secondary axis chart does more than just scale, it also shifts. Admittedly, the small multiples plot does this too, but the thing that really gives the illusion of “correlation” is the fact that the two series seem to be on top of one another. For a chart with two y-axes, this is really an artefact (as an example, have a look at the charts in this post). If you saw the series on top of each other in the index chart, it really would show that they behaved very similarly. If they were similar, but one was less volatile, the pattern would still be quite clear on the index chart.
As for the correlation, I agree that very little can be read into the differences between the different figures. Note also I should have mentioned that the 5 and 20 day correlations are correlations of the rolling differences, so I haven’t missed as many data points, but the series has autocorrelation.
I’ll take the other side of the argument (for argument’s sake!). What is often referred to as correlation is in fact mean reversion. For instance, to say that the price of wheat and potatoes are correlated (not sure if they are, but they sure are both starchy) may mean that their relative levels are somewhat constant over time. This is in contrast to their changes (or percentage changes) being statistically correlated. These are not one and the same, as even series with highly correlated changes may drift apart over time, and series with uncorrelated changes may time and again converge back to the same relative level. For the former, agreed, a 2-axes plot is irrelevant. For the latter, however, it can be a very useful data exploration tool, before going about fitting an Ornstein-Uhlenbeck process or the like. Another one is to simply plot the ratio. In this instance we can see the ratio of gold-denominated oil to EURUSD has indeed traded within a very narrow horizontal (i.e. stationary) band at least for the last 3 years: http://research.stlouisfed.org/fredgraph.png?g=8zz
@Wisdomtooth: taking up your suggestion, I’ve produced plots of the ratio of the AUD/USD, CAD/USD and EUR/USD over the price of Oil in gold. You are right that the ratio for the EUR stabilises somewhat over the last few years (not over the whole period back to 2007), but so does the series for the AUD! Going back further to 2000, the ratios really don’t look very stable.
@Wisdomtooth: your general point is a good one though. When seeing (apparently) overlapping series on a dual axis plot, the reaction is to conclude there is a significant correlation. While this may the the case, as you say, highly correlated series may not appear to overlap at all. So, either it’s a good tool for a different job (although I’d say it will tend to falsely pick up cointegrated series too) or a bad tool for the job people think they are using it for!
The thing about whether using overlapping samples or not makes a difference turns out not to matter. In both cases (non-overlapping and overlapping ie moving window) the expected value of the number is the population correlation. The error depends not on the number of samples but the number of degrees of freedom which is the same in both cases ie. the number of contiguous, non-overlapping intervals. So no benefit, but nothing wrong either, with using overlapping samples provided you remember the error isn’t improved.
Alternatively you can model this directly noting the autocorrelation due to the moving window as you point out. Again the expected value is the same but when you calculate the error you’ll find the benefit of increased number of samples is exactly cancelled out by the effect of the autocorrelation. Voila!
You should send a link to this article to Alan Kohler – the man is a master of false inferences. It might give him some ideas for other ways to talk up false correlations. Though given he just sold his webshite for $18m you have to give him credit. As ever there is not much money in the truth.
@Stubby: Yeah, that’s it, people say ‘correlated’ when they really mean ‘cointegrated’. But, who can blame them? Plus, only the micromanager (e.g. day/HF trader) cares about correlations. Longer horizon decisions are about whether things will turn out well in the end (and “if they are not now, it’s not yet the end”).