March 06, 2006
Opinion polls and statistics
In the previous post and in many aspects of life these days, we get quoted the results of opinion polls. Many of our public policies are strongly influenced by these polls, with politicians paying close attention to them before speaking out.
But while people are inundated with opinion polls, there is still considerable misunderstanding about how they work. Especially during elections, when there are polls practically every day, one often hears people expressing skepticism about polls, saying that they feel the polls are not representative because they, personally, and all the people they know, have never been asked their opinion. Surely, they reason, if so many polls are done, every person should get a shot at answering these surveys? That fact that no pollster has contacted them or their friends and families seem to make the poll results suspect in their eyes, as if the pollsters are using some highly selective group of people to ask and leaving out 'ordinary' people.
This betrays a misunderstanding of statistics and the sampling size needed to get good results. The so-called "margin of error" quoted by statisticians is found by dividing 100 by the square root of the size of the sample. So if you have a sample of 100, then the margin of error is 10%. If you have a sample size of 625, then the margin of error drops sharply to 4%. If you have a sample size of 1111, the margin of error becomes 3%. To get to 2% requires a sample size of 2500.
Clearly you would like your margin of error to be as small as possible, which argues for large samples, but your sample sizes are limited by the cost and time involved in surveying people, so trade offs have to be made. Most pollsters use samples of about 1000, and quote margins of error of 3%.
One interesting point is that there are statistical theorems that say that the sample size needed to get a certain margin of error does not depend on the size of the whole population (for large enough populations, say over 100,000). So a sample size of 1000 is sufficient for Cuyahoga County, the state of Ohio, or the whole USA. This explains why any given individual is highly unlikely to be polled. Since the population of the US is close to 300 million, the probability of any one of the 1000 people I may personally know being contacted has only a 0.00033% chance.
We know that a poll tells us that 54% of Americans say that "I do not think human beings developed from earlier species." The sample size was 1000, which means a margin of error of about 3%. Statistically, this means that there is a 95% chance that the "true" number of people who agree with that statement lies somewhere between 51% and 57%.
Certain assumptions and precautions go into interpreting these results. The first assumption is that the people polled are a truly random sample of the population. If you randomly contact people, that may not be true. You may, for example, end up with more women than men, or you may have contacted more old people or registered Republicans than are in the general population. If, from census and other data, you know the correct proportions of the various subpopulations in your survey, then this kind of skewing can be adjusted for by changing the weight of the contributions from each subgroup to match the actual population distribution.
With political polls, sometimes people complain that the sample sizes of Democrats and Republicans are not equal and that thus the poll is biased. But that difference is usually because the number of people who are officially registered as belonging to those parties are not equal.
But sometimes pollsters also quote the results for the subpopulations in their samples, and since the subsamples are smaller, the breakdown data has greater margin of error than the results for the full sample, though you are often not explicitly told this. For example, the above-mentioned survey says that 59% of people who had high school education or less agreed that "I do not think human beings developed from earlier species." But the number of people in the sample who fit that description is 407, which means that there is a 5% uncertainty in the result for that subgroup, unlike the 3% for the full sample of 1000.
But a more serious source of uncertainty these days is that many people refuse to answer pollsters when they call and it is not possible to adjust for the views of those who refuse. So although the pollsters do have data on the numbers of persons who hang up on them or otherwise refuse to answer, they do not know if such people are more likely or less likely to think that humans developed from earlier species. So they cannot adjust for this factor. They have to simply assume that if those non-responders had answered, their responses would have been in line with those who actually did respond.
Then there may be people who do not answer honestly for whatever reason or are just playing the fool. They are also hard to adjust for. This is why I am somewhat more skeptical of surveys of teens on various topics. It seems to me that teenagers are just the right age to get enjoyment from deliberately answering questions in exotic ways.
These kinds of biases are hard, if not impossible, to compensate for, though in serious research the researchers try to put in extra questions that can help gauge whether people are answering honestly. But opinion polls, which have to be done quickly and cheaply, are not likely to go to all that trouble
Because of such reasons, polls like the Harris poll issue this disclaimer at the end:
In theory, with probability samples of this size, one could say with 95 percent certainty that the overall results have a sampling error of plus or minus 3 percentage points of what they would be if the entire U.S. adult population had been polled with complete accuracy. Sampling error for subsamples is higher and varies. Unfortunately, there are several other possible sources of error in all polls or surveys that are probably more serious than theoretical calculations of sampling error. They include refusals to be interviewed (nonresponse), question wording and question order, and weighting. It is impossible to quantify the errors that may result from these factors.
For all these reasons, one should take the quoted margins of error, which are based purely on sample size, with a considerable amount of salt.
There is one last point I want to make concerning a popular misconception propagated by news reporters during elections. If an opinion poll says that a sample of 1000 voters has candidate A with 51% support and candidate B with 49%, then since the margin of error (3%) is greater than the percentage of votes separating the candidates (2%), the reporters will often say that the race is a "statistical dead heat," implying that the two candidates have equal chances of winning.
Actually, this is not true. What those numbers imply (using math that I won't give here) is that there is about a 75% chance that candidate A truly does lead candidate B, while candidate B has only a 25% chance of being ahead. So when one candidate is three times as likely as the other to win, it is highly misleading to say that the race is a "dead heat."
POST SCRIPT: Film: THE RISE OF THE POLITICS OF FEAR
The Cleveland Institute of Art Cinematheque is hosting a special free screening of the documentary film THE RISE OF THE POLITICS OF FEAR on Monday, March 6, 2006 (i.e., today) at 7:00pm. This documentary by Britain's Adam Curtis is a three-part series shown on the BBC as part of their series on THE POWER OF NIGHTMARES and was broadcast in 2004. The program is 180 minutes long.
Admission is free but an $8 donation ($5 members) is requested. For directions and free parking information, see here.
An article in the Guardian titled The Making of the Terror Myth reviews the documentary, and says in part:
Terrorism, by definition, depends on an element of bluff. Yet ever since terrorists in the modern sense of the term (the word terrorism was actually coined to describe the strategy of a government, the authoritarian French revolutionary regime of the 1790s) began to assassinate politicians and then members of the public during the 19th century, states have habitually overreacted. Adam Roberts, professor of international relations at Oxford, says that governments often believe struggles with terrorists "to be of absolute cosmic significance", and that therefore "anything goes" when it comes to winning. The historian Linda Colley adds: "States and their rulers expect to monopolise violence, and that is why they react so virulently to terrorism."
Here is information from the Cinematheque website.
Here's the most incendiary political documentary since Michael Moore's Fahrenheit 9/11! Adam Curtis' three-part essay, made for the BBC, dissects the war on terror by arguing that fear has come to dominate politics, and that the notion of a secret, organized, international terror network (e.g., Al Qaeda) is a bogeyman created by powerful interests to maintain control. Curtis, whom Entertainment Weekly has called "the most exciting documentary filmmaker of our time," employs extensive scholarship, interviews, and revealing film clips to trace the parallel rise of Islamic fundamentalism and American neoconservatism – mirror images of each other in Mr. Curtis' view. "A superbly eye-opening and often absurdly funny deconstruction of the myths and realities of global terrorism." –Variety.
I am a theoretical physicist and currently Director of 

Comments
Wow, now I feel special- I participated in a poll during the last presidential election season. Who knew I was so unique? Something I've noticed whenever I take telephone surveys is that I get really bored by the end- they basically ask the same questions differently for 15 minutes. I wonder if questions at the end are less likely to be accurate?
What those numbers imply (using math that I won't give here) is that there is about a 75% chance that candidate A truly does lead candidate B, while candidate B has only a 25% chance of being ahead. So when one candidate is three times as likely as the other to win, it is highly misleading to say that the race is a "dead heat."
While I can agree with the ultimate conclusion here, I don't quite buy the math that you "won't give". In fact, the question "what is the probability that A truly leads" is meaningless, unless you assume an a priori probability distribution on the proportion of the population that supports A. To come to grips with this, you either need to talk about null hypotheses and p values, a subject that leaves many beginning students quivering with fear - or you employ the language of Bayesian statistics and talk about relative support for various hypotheses. I'm afraid I really cannot elaborate too much here.
Harald,
I was assuming that the difference in the means was normally distributed with the standard deviation given by the margin of error.
Hmm. And what did you assume about the expected value of this normal distribution? To make clearer what I am driving at, suppose we know that B has always had the lead according to previous polls; maybe there are other polls also showing B in the lead. This extra information would yield the conclusion that A now leads even less likely. Statistics is a very tricky business, and I think you have fallen into a classic trap here.
I alluded to hypothesis testing previosly. Let me make the calculation: Assume for the sake of argument that the race is truly a dead heat: Exactly half of the voters prefer A, the other half prefer B. Simplifying a bit, we conclude that the results of the poll should be normally distributed around 50% with a standard deviation of 1.5%. What is the probability then of A being ahead in the poll with at least 51%? The answer is 25%.
In other words, if the race is a dead heat then 25% of polls would show A in the lead with at least 51%, as observed. But we cannot conclude backward from this to a probabilty that A is in fact in the lead, without further prior information.
Oh, and since I haven't posted here before, I should add that I do enjoy your blog. I read it every day. I may have a quibble here, but I don't want to come across as an old grump. (Which maybe I am, but I try to hide it.)
I think I see what you are getting at (and I am by no means an expert statistician) but here is a puzzle I have with that reasoning. Let me see if I understand your point by applying it to a more extreme case. Suppose we assume that the race is truly a dead heat with a standard deviation of 1.5%, then the probability of that a poll would result in A getting 53% of the vote is about 2.3%.
So if a poll actually had obtained that unlikely result, does it not imply that our assumption that it is a dead heat is unlikely and that A has a greater chance of winning than B?
It is sort of like saying that if I toss a coin ten times and it comes up heads each time, then I am justified in questioning my assumption that the coin is a fair one.
I guess I am not clear if you disagree with the qualitative conclusion (using the numbers given in the original post) that the poll results don't imply a dead heat or you disagree with the actual numerical estimate of a 75% chance that A is actually ahead.
Almost everything you say in the first three paragraphs above is right: But the result cannot be neatly expressed in terms of a probability of A actually being ahead. In particular, if "A has a greater chance of winning than B" is to be interpreted as a statement of higher probability, that is harder to justify.
The poll from the original post does indeed not imply a dead heat. But it lends only a moderate support to the statement that A is ahead. It is the numerical estimate of a probability that I disagree with.
And BTW: I am not an expert statistician either. I'm just a mathematician with some very basic familiarity with statistical methods.
Ok, I get it. Thanks!
So what does one have to do to get a good numerical probability estimate of the chances that A is ahead? Or does that require too many extra assumptions that would make the results of little value?
I'm afraid the latter is the case: Too many assumptions are needed.
There is actually a valuable lesson in here:
Beware of statistical significance.
Much too often, we here the phrase "statistically significant" thrown about as if it meant "scientifically proven". Nothing could be further from the truth. When you see "the finding is statistically significant (p=0.05)", read that as "assuming what we are trying to show is not so, we could expect a test result this good only 5% of the time".
This tends to obfuscate the important question: What if our hypothesis is still wrong? What are the consequences of falsely assuming the hypothesis right? You cannot possibly tell if that p=0.05 result is good enough unless you know and can weigh the consequences of all four outcomes (the hypothesis is true or false, you conclude either way, a total of 2×2=4 possibilities). The case of drug testing is the obvious example. Clearly, it is bad to reject an effective drug as ineffective, but it can be equally bad to prescribe an ineffective drug "proven" to be effective in an inadequate test. (Some will say it's worse, but that all depends on the side effects and whether the ineffective treatment displaces a better one.) By switching the focus from the true-or-false question to a decision problem (to prescribe or not to prescribe) you set the stage for a much more rational discussion.
The poll example is much more trivial, of course. Few people live or die as the result of an opinion poll, and perhaps the biggest decision someone might make based on the poll is whether or not to bother to vote. "A is sufficiently ahead, so it won't make a difference whether I vote or not." You can demolish that one as well as I can: No fancy statistics required. Wasn't it Gandhi who said the following?
Whatever you do will be insignificant, but it is very important that you do it.
Uh, I seem to have moved off on a tangent.
Never mind.