Related Posts

What's the Matter with Michigan?
Less is More Even in Election Polling

Thursday, June 27, 2024

Topic: Politics
Content Type: Analysis
Keywords: elections, polling, 538, rcp

Less is More Even in Election Polling

How Election Forecasting Models Are Overengineered

Key Takeaways

Overview

538 pioneered the election forecast model under the assumption that polls, particularly early in an election cycle, could be augmented with historic voting data and economic factors. This additional information could plug the holes in still sparse polling that prevailed months before the election. Additionally, it could make projections for the voters who were still undecided or who might change their minds in the ensuing months when the campaigns ramped up and drew more attention from the voters.

Now, there are multiple election forecasting models. The Economist, notably, offers one, and Decision Desk HQ and The NYTimes Upshot both operate one as well. There are also several political bloggers who operate their own. These outfits pride themselves on basing all their analysis on data--historic voting patterns based on geography and economic factors. They are not to be confused with places like The Cook Political Report or CNN which use experts' opinions to guide their predictions.

What seems to get lost in the political news frenzy, is whether or not these forecasts actually add value to the analysis. Do they in fact do a better job predicting the outcomes of elections than simply relying on polls. That is the claim on which they should be tested. The question is not "did they predict the winner nationally?" but "did they predict the winner better than polls would have?"

The question is not "did they predict the winner nationally?" but "did they predict the winner better than polls would have?"

Each of these models starts with the polls and then favors some over others depending on their quality. Then, they apply a special sauce which takes account of the demographics of the state or district, the economics, and how voters have responded to these factors historically. Lastly, they adjust their poll average to predict an actual voter outcome. If polls differ from projections, it's because of what goes into their secret sauce.

After 2016, the world was in an uproar because the polls and the models all predicted Trump would lose. 538 argues that it was closer than the other models and that they predicted some probability of Trump winning, but that should not persuade anyone. What did persuade everyone was 2020. All the models were right; they predicted a Biden win, and that's what they got. However, when you look more closely at the predictions compared to the results, it seems likely that 538 got lucky, because, like in 2016, 538 consistently overestimated Biden's vote share.

Like in 2016, 538 consistently overestimated Biden's vote share.

Now, in 2024, I've become very skeptical that these models provide any value. The issue might be particular to Trump, but in both 2016 and 2020, the models did not provide any additional information than polls themselves. In fact, they performed worse than the polls. And now, for the 2024 election, 538 has the race at almost 50/50, despite Trump being ahead in the polls, which he never was before in 2016 or 2020. Historically, not only have polls underestimated his vote share, but the last two elections have shown that the Democratic candidate needs a margin of 4-5 points to beat Trump in the electoral college, because Democratic voters are concentrated in fewer states.

In 2016 and 2020, models performed worse than polls alone.

538, like many other organizations, has lost sight of what their ultimate goal is, to provide better predictive power for an election. It has become a full-fledged politics blog. It is a blog different from most in that is more focused on data analysis, but like Slate said in 2020, "The polling gurus portray themselves as objective number-crunchers, unswayed by human bias or emotion. But in truth, they are in the reassurance business....[T]hey're hawking a false sense of certainty—and, presumably, racking up earth-shattering levels of web traffic in the process."

2020 Election

In 2020, on election day, 538 projected an 89% chance that Biden would win the election, and hey, they were correct; Biden did win the election, so they must have done well, right?

The thing is, if you look more closely at the state-by-state projections, there was a consistent, over-projection for Biden. The "Path to Victory" graphic is a good jumping off point. In this graphic, states are sorted from most pro-Trump to most Pro-Biden, according to the model. A line is drawn through the state that would put either candidate over the 270 threshold. In this case, that state was Pennsylvania, which was colored blue, meaning Biden was likely to win it.

And further, Arizona, Florida, North Carolina, and Georgia were all colored blue. Ohio was the "bluest" state that was expected to go Trump (or alternatively) the least-red state expected to go Trump. 538 projected that Ohio would end up 49.8% Trump and 49.2% Biden (margin of 0.6% for Trump). The election though, ended up 53.4% Trump and 45.2% Biden (margin of 8.2% for Trump). 538 gave Biden an extra 7.6% boost.

Of 13 purplish states, every one of them had a pro-Biden bias in 538's model. Ohio tied Wisconsin for the highest--7.6%. The lowest was Georgia at 0.7%. On average, 538 gave Biden a 4.3 percentage point boost.

But perhaps 538's poor results were a consequence of polls themselves being biased. Perhaps 538 took biased polls and actually de-biased them back towards actual results. In fact, the opposite happened, 538's adjustments pushed their predictions farther from the actual results.

Compounding that, 538, like RealClearPolitics takes the raw polls and averages them. Unlike RealClearPolitics, though, 538 tries to average the polls in a way that favors "high quality" pollsters using a proprietary weighting approach. Theoretically, this should lead to poll results more illustrative of reality than a simple poll average, like RealClearPolitics's, would.

Again, this is not so. The RealClearPolitics simple average performed substantially better than 538's more sophisticated average. RCP's average favored Biden (compared to actual election results) by 2.3%, on average for these states, compared to 4.3% for 538. RCP's average got Pennsylvania and Nevada exactly right, and actually gave Trump a bump in Georgia and Minnesota.

The RealClearPolitics simple average performed substantially better than 538's more sophisticated average.

2020 RCP and 538 Poll Results vs. Actual Results

It's hard to not conclude, given the state-by-state projection results, which were much more favorable to Biden than the election results, which he won relatively narrowly, that 538 was more lucky than accurate, and definitely that its more sophisticated approach did not make the model any better at predicting the outcome than a simple average would. Adding more variables to a model, might make it more sophisticated, but it doesn't make it better at doing what it is meant to do.

It's hard to not conclude that 538 was more lucky than accurate, and definitely that its more sophisticated approach did not make the model any better at predicting the outcome.

2016 tells much the same story. On average RCP's simple average was biased towards Clinton by 2.1%, and 538's weighted average of polls was biased towards Clinton by 3.8%, on average. Contrary to 2020, however, the "fundamentals" (which 538 uses to adjust the polls), favored Trump, slightly. So, that average bias for 538 dropped from 3.8% pro-Clinton, to 3.3% pro-Clinton, still more biased than RCP.

2016 RCP and 538 Poll Results vs. Actual Results

So what do these models provide? For one, in the 2024 web economy, content is king. A model that has a million levers that are pulled in different directions every day provides more topics and discussion for our 24/7 news cycle's professional pundits and the universe of amateur pundits across the social media landscape. Each day, these models provide a week's worth of information to dissect and comment on.

It also gives more people the ability to find positive news amid the bad. Your candidate can be down in the polls, but if the models are saying he's going to win, you need not despair. And on top of that, it's science--based on regressions and data, which you may not fully understand, but is more reliable than opinions and polls.

Comments

Add a Comment