Poll Dancing

by zebra on 15 July 2013 · 5 comments

With elections looming, and Kevin Rudd’s return to power, it is time for our regular guest blogger, James, to pull out his beer coaster calculator and take a closer look at the polls. 

It is really that time again. Australian election fever has risen. Though in this case it feels like we have been here for three years since the last election. Polls every week telling us what we think and who we will vote for. But what exactly do these polls mean? And what do they mean by “margin of error”?

So here is the quick answer. Suppose you have a two party election (which two party preferred, 2PP, effectively amounts to through Australia’s preference system). Now suppose each of those parties really has 50% of the vote. If there are 8 million voters and you poll 1,000 of them then what can you tell? Surprisingly it turns out that of these inputs the number of 8 million voters is actually irrelevant! We can all understand that if you only poll 1,000 voters out of 8 million then there is a margin of error. This margin of error turns out to be quite easy to compute (using undergraduate level Binomial probability theory) and only depends on the number of people polled, and not the total number of voters. The formula is:

MOE = k × 0.5 /√N.

where N is the number of people polled and k is the number of standard deviations for the error. The formula √1000 = 33 so 1/√1000 = 0.03 = 3%. The choice of k is somewhat arbitrary but in this case k = 2 (because for the Normal distribution 95% of outcomes lies within k=2 standard deviations of the mean) which conveniently makes k × 0.5 = 1. So MOE=1/√N is a fairly accurate formula. If N=1000 then MOE=1/33=3% (give or take). This simply means that even if the actual vote was 50:50 then 5% of the time, an unbiased poll of 1,000 voters would poll outside 47:53 due purely to random selection. And even if the actual vote is, say, 46:54, the MOE will be about the same.

Interestingly in the US where there are about 100m voters they usually poll at N = 40,000 which makes the MOE = 0.5%. In this case the economics of polling scale as the number of voters hence they can afford to poll more people. But the total number of voters, 100m or 10m, is irrelevant for the MOE. As the formula shows to improve the accuracy of the estimate by a factor of 10 (say from 3% to 0.3%) they would need to increase the sample size by a factor of 100. You simply can’t get around this.

One of the criticisms of polling is that that they don’t reach the same number of (young) people on mobile phones as older people on land lines. This is easily fixed. You just adjust the figures according to what type of phone they are using based on known percentages of who uses what type of phone. Similarly you can adjust by gender and age. The interesting thing though is that the further you get from actual phone usage/gender/age in your poll you also need to increase your MOE, but not your expected outcome.

Okay so that is it: MOE = 1/√N where N = number of people polled. If N = 1000 then MOE=3%. My all time favourite back of the beer coaster formula.

The recent jump in the 2PP polls for Labor when Kevin Rudd reassumed the PM-ship from about 45% to 49% were greeted by journalists as “Kevin Rudd is almost, but not quite, dead even”. I found this amusing as it could statistically have been 51%, within the MOE, in which case the headline would have been “Kevin Rudd is ahead!”. Indeed barely a week later he was “neck and neck” in the polls at 50:50. Next week it may be “51:49″ in which case he will be declared on a certain path to victory! However within the MOE of 3% these results are statistically indistinguishable.

From my point of view, as a professional statistician, I find the way many journalists develop a narrative based on polls from week to week, without understanding the margin of error, quite annoying. Given the theory that if a politician has the “The Mo” (ie. momentum) it may end up helping them win when it is irresponsible to allow random fluctuation due to statistical sampling error to influence the outcome of an election. Unless of course it helps the party I support win.

Possibly Related Posts (automatically generated):

{ 5 comments… read them below or add one }

1 Ken July 15, 2013 at 9:05 pm

What is worse is the unemployment figures. They have a 95% CI for change of plus or minus 0.2 percentage points, so why is there a big fuss about increases or decreases of 0.1 percentage point. The trend is a line that is increasing at about 0.5 pts per year and it doesn’t look like changing.

2 Senexx July 15, 2013 at 9:28 pm

Thanks James. I always wondered how MoE and Confidence Interval was determined and you just answered that question for me. Thank you.

That said the 2PP is nearly always 52:48 or any other number you want to give within the MoE at or near an election, therefore polling proves nothing at all.

I do, however, always take 3% off whichever party is in the lead on a poll as a thought experiment.

Ken, thanks for that information on UE. As I didn’t know how MoE & CI was determined I had not concerned myself too much with UE calculation but you make a good point. (Though I haven’t verified your trend of 0.5).

As the ABS would tell us, as any good finance or economics expert would tell us and as erudite audience of Stubborn Mule would know is “the trend is your friend”.

3 James July 15, 2013 at 9:33 pm

Btw it should,say “an unbiased poll of 1000 voters should poll outside of 47:53″ (5% of the time). Perhaps Sean could fix this?

4 Ken July 15, 2013 at 10:31 pm

James, the June 2013 results are at http://www.abs.gov.au/AUSSTATS/abs@.nsf/mf/6202.0 Trend over one year can easily be read off the graph. I find it fascinating how the monthly changes distribute themselves around the trend line, which is what would be expected. The 95% CI for monthly changes are at the end. It would be nice if they supplied quarterly and yearly changes with 95% CI, as these would be useful.

5 Stubborn Mule July 16, 2013 at 10:19 pm

@James: Done!

Leave a Comment

Previous post:

Next post: