Saturday, October 6, 2012

Deciphering Public Opinion Polls

For the next month, the press will bombard the American public with a flurry of election polling: some good, some bad, many unintelligible, and almost all of them meaningless. Most of the polls will be presented to support one agenda/candidate/party or another. Usually the substance, if any, will be buried in a steaming pile of irrelevant facts and unsupported assertions.

With this in mind, allow me to suggest some basic rules for understanding and evaluating public opinion polls.

Rule #1: Polls are fundamentally easy to understand.

The first rule is to accept the basic principle that polls are easy to understand--so easy that a news reporter could do it . . . if the reporter actually tries. Regrettably, too many reporters consistently fail to even make the attempt.

Anyone who can add, subtract, multiply, and divide simple numbers can understand the results of a poll. Beyond the basic math, it's just a matter of knowing and applying a few principles and understanding a few key terms.

Rule #2: At best, a poll is an educated guess.

If a poll is conducted perfectly, it has at least a 5 percent (1 in 20) chance of being completely wrong. In other words, it has a 95 percent chance of being correct. However, because no poll is perfect, the probability of it being completely wrong is actually greater than 5 percent.

Rule #3: The results of a poll are ranges, not precise points.

Despite appearances, the percentages (or numbers) reported are not precise points, but rather midpoints in ranges of probability. Any credible report of a poll will mention the poll's margin of (sampling) error. This is the range of probability. Anyone who talks like a poll result is a precise point either is ignorant or is shading the truth.

For example, a recent CNN/Opinion Research poll on the presidential election reported President Obama leading Governor Romney 50% to 46% among register voters with a margin of error of plus or minus (±) 3.5 percentage points. This means that if Obama is polling 50 percent, pollsters are 95 percent sure that the actual value is between 46.5% and 53.5%--3.5 percentage points above or below 50 percent. Similarly, Romney is between 42.5% and 49.5%. As shown in the figure below, in this poll there is an overlap of 3.0 percentage points between the ranges. Because the ranges overlap, the race is a statistical tie according to this poll. This poll suggests Obama may have a slight advantage, but we can't say that with any certainty because the poll is simply not that precise.

Rule #4: A poll is only as good as the sample.

The purpose of an opinion poll is to determine the overall views of a large group by measuring the views of a small group. The small group is known as the sample. Surveying the entire group is simply too expensive and too time-consuming, which is especially true if you try to survey every voter in America.

In most polls, pollsters estimate the views of more than 100 million U.S. voters by interviewing only about 1,000 voters. It may seem absurd that a group of 1,000 can accurately predict the behavior of more than 100 million, but it can--with a surprising degree of precision.

If I flipped a coin 1,000 times, the result would likely be 50% heads and 50% tails within a few percentage points because each coin flip would have a 50% probability of being heads. The more I flipped the coin the closer the result would likely approach an even split. The same statistical rule applies to opinion polling, except we don't what the true split is as we do when flipping a coin. It may be 50-50; it may be 52-48. For example, if 52% would vote for President Obama, then as pollsters asked more and more people, the poll's results would tend to get closer and closer to 52%. After asking 1,000 voters, their result should be within about 3 percentage points of 52%. (Of course, there is a 5% chance that the pollsters had a bad run of luck, and their result is outside range of plus or minus 3 percentage points.)

Increasing the sample size beyond 1,000 to 2,000 or even 10,000 increases accuracy, but not by much. The sweet spot that balances accuracy and the expense of conducting the poll is about 1,000 people. This sample size yields a margin of error of approximately ±3 percentage points, much like the CNN/Opinion Research poll, which surveyed about 800 potential voters.

The challenge is that opinion polling depends on using a random sample, meaning to be an accurate poll each member of the larger group must have an equal chance of being interviewed as member of the sample of 1,000. If not, the poll has a serious problem. The less random the sample, the more unreliable the poll. For example, if women constituted 75 percent of a sample of U.S. voters, the result probably isn't truly random or is "skewed." (It is possible, but highly unlikely, that a random sample could produce such a result.) Similarly, if a poll surveys three times as many Republicans as Democrats, the result would likely be skewed toward Romney and not reflect reality at all.

Pollsters use many techniques to correct skewed samples, but they are imperfect at best. Often, polls demonstrate the principle of GIGO (garbage in, garbage out) in computer science. If the sample (the input) is garbage, the result (the output) will be garbage despite the pollster's best efforts to fix the problems.

Many things can skew a poll, and modern technology is making it even more difficult to obtain a random sample. In Part II, I'll discuss some of the things that can skew a poll.

No comments:

Post a Comment


blogger templates | Make Money Online