September 02, 2008

Understanding polls

Before he moved over to his new home at Mother Jones, Kevin Drum revisited a topic at his old Washington Monthly blog that I too have raised before, to criticize reporters who say that there is "statistical dead heat" whenever the polls show the difference between voters preferences for two candidates fall within the margin of error.

In other words, if the polls show 46% for Obama and 43% for McCain with a 3% margin of error, then the race is reported as a "statistical tie" or some such thing, giving the impression that it is a toss-up as to who is ahead. This is simply not true.

Drum has consulted with two professors pf mathematics and statistics at California State University, Chico and they have provided the formulas that enabled him to prepare a handy little chart to tell you the actual chance that some one is ahead, even though the preferences fall within the margin of error.

Percentage Lead
Margin of Error 1% 2% 3% 4% 5% 6%
2% 69% 84% 93% 98% 99% 100%
3% 63% 74% 84% 90% 95% 98%
4% 60% 69% 77% 84% 89% 93%
5% 58% 65% 72% 78% 84% 88%

So if a candidate has a 3% lead with a 3% margin of error, far from being a dead heat, it is highly likely (84% chance) that the candidate is actually ahead. Even if the candidate has only a slim 1% lead and the margin of error is a whopping 5%, it is still not a 'dead heat'. The candidate still has a 58% chance of winning.

Like Drum, I do not have much hope that reporters will ever change their misleading reporting because they have a vested interest in continuing to talk this way. Races that are close generate more interest and thus more viewers and readers, so reporters will always try to make them seem closer than they are.

Talking of polls, there seems to have been an explosion in the number of polling organizations out there, and their results differ. This can cause some confusion in the public mind. When one polls gives one result one day and the next day the media report another poll with quite different results, this might give people the impression that the race is highly volatile or that some polling organizations are biased in favor of one or another candidate.

But that need not be true. There is something called the 'house effect' that can skew the results in particular ways without any intention of misleading. Charles Franklin over at explains what is going on:

Who does the poll affects the results. Some. These are called "house effects" because they are systematic effects due to survey "house" or polling organization. It is perhaps easy to think of these effects as "bias" but that is misleading. The differences are due to a variety of factors that represent reasonable differences in practice from one organization to another.

For example, how you phrase a question can affect the results, and an organization usually asks the question the same way in all their surveys. This creates a house effect. Another source is how the organization treats "don't know" or "undecided" responses. Some push hard for a position even if the respondent is reluctant to give one. Other pollsters take "undecided" at face value and don't push. The latter get higher rates of undecided, but more important they get lower levels of support for both candidates as a result of not pushing for how respondents lean. And organizations differ in whether they typically interview adults, registered voters or likely voters. The differences across those three groups produce differences in results. Which is right? It depends on what you are trying to estimate - opinion of the population, of people who can easily vote if the choose to do so or of the probable electorate. Not to mention the vagaries of identifying who is really likely to vote. Finally, survey mode may matter. Is the survey conducted by random digit dialing (RDD) with live interviewers, by RDD with recorded interviews ("interactive voice response" or IVR), or by internet using panels of volunteers who are statistically adjusted in some way to make inferences about the population.

Given all these and many other possible sources of house effects, it is perhaps surprising the net effects are as small as they are. They are often statistically significant, but rarely are they notably large.

One way to avoid mistaking inter-poll variability for voter volatility is to track the results of just one poll. In other words, only compare the results of one poll with the earlier results of the same poll conducted using the same methods and questions.

Another way is to do what the outfit Real Clear Politics does. It tries to take some of the inter-poll variability out by giving the averages of the major polls as a function of time.

To paraphrase Jon Stewart, elections are god's way of teaching Americans statistics.

POST SCRIPT: Mike Huckabee on Colbert Report

It was refreshing to listen to Mike Huckabee being interviewed on the Colbert Report about his reaction (after just the first two days) to the Democratic Convention. Huckabee was one of the most interesting primary candidates on the Republican side but the attacks on him from the Republican Party establishment were quite vicious.

Although I disagree with many of his views, there was something engaging and honest about him that I found likeable. He also has a sense of humor. All these positive characteristics are reflected in the interview. His closing comments on Obama and the role of race in America seemed genuine and heartfelt.


Trackback URL for this entry is:


There are a couple problems with this. You might want to talk with a statistician at your own institution to verify (unless I have it wrong!) the following.

Minor point. The numbers in the table depend on the percent of undecided voters. Probably, if we are to construct such a table, it would be best to be conservative and assume none are undecided (gives the smallest answers). That is not what was done here.

Major point. There is no such thing as a "probability" that, say, Obama, is ahead in the entire population of voters. This is not a probability statement: it is either TRUE or FALSE. The probability tells you something quite different: IF Obama really were tied or behind, what would be the probability the poll results would have come out less favorably to him?

Subtle distinction.

Love your blog, by the way.

Larry Taylor

Posted by Larry Taylor on September 2, 2008 10:37 PM

reminds me of a couple of other math gaffes.

in canada, where we use the celsius system for temperature, you'll sometimes find temperature copmarisons in multiples, as in "this january was twice as cold as last january", because the average temperature was -10C compared to -5C last year. this is wrong, of course, because the celsius scale is keyed to the boiling and freezing points of water, not some absolute standard of coldness.

the other was when the first lancet study of casualties in iraq came out based on household sampling rather than totaling up the media reports like iraq body count does. the result was 100,000 but with a very high 95% confidence interval, something like 92,000, so that the media reported that the study found "between 8000
and 192,000 deaths", as though all numbers in the range were equally likely, and all numbers outside the range were excluded. it was not harmless error, since supporters of bush's war crimes started pretending that the study only showed 8000 casualties, even though what it showed was a 97.5% likelihood that the number was higher.

Posted by disgruntled goat on September 3, 2008 09:19 PM

Major point. There is no such thing as a "probability" that, say, Obama, is ahead in the entire population of voters. This is not a probability statement: it is either TRUE or FALSE
Actually, I believe that this can be viewed as a conditional probability: If event A is "Obama is ahead" then it makes sense to calculate P(A | a g.t.e x g.t.e b)
where "g.t.e" stands for greater than or equal to and x stands for the difference between the percentage favoring McCain and those favoring Obama.

Posted by ollie on September 3, 2008 09:58 PM

Oops! I spoke too quickly last time. On recalculation, I believe the numbers ARE the correct (conservative) ones.

I understand Ollie's notation and I think that is how many would interpret the numbers in the table. (I assume x is a poll result.) I still say A is not a probabilistic event.

For example, under error 3% and lead 4% we see the figure 90%. This is be interpreted as follows. If the margin of error is given as 3% and the candidates really are tied, then in 10% of such polls Barack would be reported as leading by 4% or more. Taking a random sample is doing something probabilistic and it is meaningful to speak of the probability of how it will come out.

Larry Taylor

Posted by Larry Taylor on September 3, 2008 11:02 PM

It reminds me of the whole 2000 election fiasco, where, if I'm not mistaken, the margin of error in counting votes was orders of magnitude higher than the difference in votes in Florida.

If our country is not willing to get rid of the electoral college, states should at least have some sort of caveat for statistical ties: where the confidence that one candidate got more votes than the other is less than some number (maybe 90% or so).

Out of curiosity, does anyone know the typical standard deviation for voting? Of course, there are many ways of defining the "actual vote", so the numbers could differ. One definition might be: Actual Vote = hole that was punched correctly (or however the vote is cast). Although I think the best definition would be more like: Actual Vote = What the person voting intended to vote.

The difference is exemplified by the situation of all the people who "voted" for Buchanon instead of Gore because of the error in the voting book. Under the latter metric the voting system failed for these people because they intended to vote for Gore, but their vote was counted as Buchanan. Under the other metric the votes were "cast" for Buchanan.

Posted by Jared on September 4, 2008 02:18 PM

Entropy is one of “Physics Foibles” – but it is more than Physics. It is in every day experience. If you understand entropy you can understand polls.

Posted by Melvin Goldstein on March 27, 2009 02:38 PM