Discover more from Good Reason
Nate Silver's Finest Hour (Part 2 of 2)
When more accurate isn't accurate enough
On the previous Good Reason: days before the 2016 Presidential Election, the election forecasters are in a heated debate. Most give Trump slim-to-none chances. Nate Silver and his valiant band at 538, however, sound the alarm that Trump still has a shot. The other forecasters and pundits — especially at the Huffington Post — won’t have any of it. Hit pieces and Twitter feuds fly, arguing whether Nate Silver has lost it.
IV. Election Day 2016
In his original hit piece on Nate’s forecast, the Huffington Post’s Ryan Grim says, “So who’s right? The beauty here is that we won’t have to wait long to find out.”1
Yet on Election Day itself, the Huffington Post cannot wait for official results before calling Nate Silver a big fat failure. While people are still voting, HuffPo publishes “I’m A Stats Prof. Here’s Why Nate Silver’s Model Was All Over The Place.” It contains this gem (emphasis mine):
Now that the votes are finally being cast and counted, it’s time to turn to that hallowed American tradition: rating how well pollsters and meta-pollsters did. I cannot claim to have an opinion on how most of these performed. However, I did take a look into one of the best-known meta-pollsters: Nate Silver’s FiveThirtyEight.com. Unfortunately for 538, it seems we can call the race early: things did not go well.
This is at 4 PM EST in the middle of Election Day. I can’t help but marvel at the hubris of this. This is like a baseball announcer saying, “We’re at the top of the first inning, but we can call it early: the coach did a horrible job in this soon-to-be loss.” I can only express my appreciation that they left the article up for everyone to see, timestamp and all.
For the next few hours, the 538 v. everyone battle falls silent. Each forecaster holes up in their own site’s live election coverage. Reporting in respective silos, rather than the open battlefield of Twitter, offers a temporary peace.
Around 8 PM EST, the Clinton’s-got-this camp are happily posting the signs of an upcoming victory. Sam Wang, who gave Clinton a 99% chance of winning and who Wired crowned the “new election data king”, says in his Election Night live coverage:
8:13pm: I’m unaware of any advance indications of Trump overperformance. On the contrary, we have: (a) early voting neutral or more Democratic than 2012; (b) massive Latino voting; and (c) high turnout. If I had to guess, I’d say any error will favor Clinton.
From Ryan Grim’s Twitter feed at 8:24 PM:
Around 9:00 PM EST, things start to diverge from the expected narrative.
There’s a shift in all the non-538 election forecasters. Their posts, previously even-keeled and frequent, slow in tone and tempo. You can feel that night, the creeping dread setting in.
Sam Wang posts 9 updates from 8 to 9:15 PM PST in his election blog. From 9:15 PM onward - the next 2.5 hours - he only posts 4 updates. Double the time, half the posts. To my frustration, the Huffington Post’s live election coverage from that night is gone. Still, in Ryan Grim’s Twitter feed, election night casts the same pall. After a little flurry of Tweets in the two hours before 8:30 PM, Grim tweets only one thing between 8:30 and 11 PM:
This tweet is about Clinton’s chances in Florida, but it might as well be about the whole election.
Meanwhile Nate is posting up a storm. Both on 538’s live coverage and Twitter, he fires off tidbits left and right about Clinton’s longshot outs, the popular vote v. electoral college, and lessons for the next election. He sees his role not as predicting the future but explaining the statistics underlying it. So he does just that.
Still, he can’t help but get in one dig:
Over the next few hours it looks worse and worse. Florida and the midwestern states, though not yet officially called for Trump, look increasingly like they’re his to lose. Without any of these states, Clinton’s paths to victory are cut off one by one.
As it all falls apart, Ryan Grim responds to Nate’s tweet and, to his immense credit, owns up:
Not all forecasters are as contrite. In his final post of the night, Sam Wang is apologetic in a “let’s say we all messed up equally” way:
11:44pm: The business about 65%, 91%, 93%, 99% probability is not the main point. The entire polling industry – public, campaign-associated, aggregators – ended up with data that missed tonight’s results by a very large margin. There is now the question of understanding how a mature industry could have gone so wrong. And of course, most of all, there is the shock of a likely Trump presidency. I apologize that I underestimated the possibility of such an event.
In contrast, Nate’s final take after Donald Trump is officially declared president:
In an extremely narrow sense, I’m not that surprised by the outcome, since polling — to a greater extent than the conventional wisdom acknowledged — had shown a fairly competitive race with critical weaknesses for Clinton in the Electoral College. It’s possible, perhaps even likely, that Clinton will eventually win the popular vote as more votes come in from California.
But in a broader sense? It’s the most shocking political development of my lifetime.
V. The Aftermath
After the shock of election night wears off, the other forecasters reflect.
Sam Wang comes around to a more complete apology:
In addition to the enormous polling error, I did not correctly estimate the size of the correlated error (also known as the systematic error) by a factor of five. As I wrote before, that five-fold difference accounted for the difference between the 99% probability here and the lower probabilities at other sites. We all estimated the Clinton win at being probable, but I was most extreme. … Polls failed, and I amplified that failure.
Natalie Jackson, the Huffington Post’s forecaster, also writes a post-mortem. There’s an undercurrent of bad blood here:
In the last week, I gave in to the negativity and began hitting back when Nate Silver of ESPN’s FiveThirtyEight would say that 90+ percent probability forecasts were unreasonable. I still don’t think that was a fair assessment.
Nonetheless, she goes on to say:
Silver was right to question uncertainty levels, and was absolutely correct about the possibility of systematic polling errors ― all in the same direction. Clearly, we disagree on how to construct a model, but he was right to sound the alarm.
Others in mainstream media, at least those who closely followed the forecasters, recognize that 538 did a good job. Articles in the Washington Post and NY Mag point out that Nate was the only one saying Trump had a real chance. Buzzfeed grades the forecasters in a surprisingly rigorous article. Both at face value (via giving Trump the highest chance of winning) and in statistical analysis (via Brier Scores), 538 outperforms everyone else.2
Nate, for his part, writes a series of explainers about the 2016 forecast. These are emphatically not apologies. Instead, they focus on facts like:
the 538 model gave Trump a 3-in-10 chance of winning.
the other major models gave Trump a much smaller chance of winning
the national polls were only 1-2 points off the election result
there were individual states (hello, Rust Belt) with bigger poller misses, but they were still within the historically-expected range of misses.
Trump’s win, while surprising and in a moral sense ‘unthinkable’, should not have been a shock based on the data. Moreover, 538 provided a really good assessment of the risk. Just look at any of the articles 538 published right before the election.
But for all the Brier Scores, non-538 mea culpas, and explainers, people still remember 538 said Clinton was more likely to win. To Nate’s chagrin, people view “more likely” as synonymous with “would”, so people remember that 538 said Clinton would win. Even if they were less wrong than other people, the logic goes, weren’t they still wrong?
Worse, for those not closely following the race, 538, other election forecasters, and polls all get bundled up together. The general public doesn’t remember which forecaster said what, and whether the forecasters ran the polls themselves. They were all pretty gung-ho on Clinton, right?
Mere hours after Nate Silver was shit on for being too bullish on Trump’s chances, you can find Tweets shitting on him for the opposite:
As time goes on, high-res memories of the 2016 election fade into an impressionist painting. 538, other election forecasters, and the polls merge in the popular consciousness. People who weren’t fighting with Nate during the 2016 election discover it’s a fun and easy way to dunk on him.
In part, people target 538 because it’s the poster child of election forecasting. One way or another, they all said Clinton was more likely to win, and 538 is the most obvious representative of that group.
Moreover, the other election forecasters fade away one by one. After the Huffington Post writes about “what [they’re] doing to prevent a repeat”, they ultimately avoid a repeat by silently canning their election forecasting team. As far I can tell, Sam Wang, Slate, and Daily Kos make no presidential forecast in 2020. The NY Times Upshot has only 3 state-specific needles in 2020, and no overall presidential election forecast. It’s all kind of sad. The huge ecosystem that appeared vanishes, realizing the forecasting game that looked so easy in 2012 could be brutal.
By 2020, 538 is the only major election forecaster still in the game. This change is a testament both to 538’s stature - it’s the only one with any credibility left - and to the whole field’s disrepute - even 538’s reputation is tainted. For all the grief it got, 538 comes out the biggest player in election forecasting, sullied but standing.
If you were to ask the general population to identify Nate Silver’s finest hour, few would say 2016.
Sure, there is a camp of people - those who were paying extra-close attention to Nate Silver and the other election forecasters in 2016 - who think Nate did a good or even great job in 2016.
But the vast majority of people didn’t pay that close attention. At best, most of us loosely paid attention by panic-refreshing 538 the week before Election Day. If asked about Nate Silver and 2016, the general population would say, “Oh, he got 2016 wrong, right? Maybe other people were even more wrong, but he still got it wrong.”
I think this is a shame.
Imagine you and your coworkers go out to dinner. You order a reasonably-priced dish. Your coworkers start ordering the most expensive things on the menu. You say, “hey, I’m not really comfortable splitting this.” They mock you. “Relax, we got this,” they say. Another coworker orders bottle service. You didn’t even know this place had bottle service. “Please, please stop,” you say, “this is totally irresponsible!” They mock you more: “Mr. Cheap over here!”
The bill comes. It’s $12,000. While you’re staring dumbfounded at the total, your coworkers sheepishly slip out on the bill. You can’t pay either. The restaurant, though furious, eventually agrees you only need to pay for your portion of the bill. But for years after, people won’t get dinner with you because they hear you rack up an enormous bill and refuse to pay.
This would drive you crazy. You were responsible the whole way through. You ordered a reasonable dish that you could pay for. You tried everything, argued with your coworkers to stop their nonsense. Despite being in the right, you get stuck with the terrible reputation.
I imagine this is how Nate feels about 2016. He made a completely defensible forecast that showed Clinton likely winning and Trump with a solid chance. When Nate tried — correctly — to show that a 2016 Clinton victory wasn’t a done deal, the other forecasters and pundits mocked him as dumb or, worse, putting his thumb on the scales. When Clinton lost, the other forecasters slinked off into the background and left him with a bad reputation.
I also think it’s shame because 2016 should be a lesson in the virtues of thinking probabilistically.
Teaching the public to think probabilistically is Nate’s whole raison d'être. His comparisons to poker, use of odds, thinkpieces about overconfidence, they’re all to get us to understand that the future is inherently uncertain. He just wants us to quantify that uncertainty.
Unfortunately, we don’t like uncertainty. We think in outcomes, not distributions. And so his 30% forecast gets boiled down to “He said Clinton would win”, which, yes, is technically true in some sense.
Still, uncertainty is useful. The difference between Trump having a 1% chance and a 30% chance has huge implications. If I’m sure Trump will lose, then I can drink a celebratory glass of tap water and kick back. If I’m not so sure, maybe I should register voters or donate to the Clinton campaign. Having an accurate sense of your own certainty helps you decide what to do.
538 raising the alarm about Trump’s chances was really valuable. But instead of listening to this information, the pundits and forecasters downplayed and even insulted it.
Somewhere, maybe even before the 2016 election, Silver himself noted that by saying Trump had a 30% chance of winning, they were kinda screwed no matter what. If Clinton won, they’d look dumb in comparison to the Sam Wangs of the world. If Trump won, well, you get the world we’re in. And this world clearly traumatized Nate Silver.
Our world is set up to reward absolute certainty and punish probabilistic thinking. If you say something is 70% likely to happen and it doesn’t, people will say you fucked up. If you say something is 70% likely to happen and it does, you will be trashed by the people who were absolutely sure it was going to happen. And maybe those absolutely sure people were very smart, but often they’re just overconfident and lucky. The people who know the most, realize how hard it is to know things.
The world is full of uncertainty. Let’s reward the people who can see it.
Thanks for reading Good Reason! Subscribe for free to receive new posts and support my work.
In contrast with Grim’s claim, any stats person will tell you the election outcome doesn’t prove anybody right. Clinton winning doesn’t inherently mean Nate’s model was wrong. Nor does Trump winning inherently mean that the other forecasters, even Sam “99% Hillary Wins” Wang, had bad models. More on that later, though.
Again, while we can’t be sure that a model is ‘better’ just because it was more positive on Trump, I think it’s clear Nate’s model was better. The better Brier Scores certainly help, for one. Most of all, thanks to the pre-election debates, we get insight into the thinking behind the models. Nate’s thinking - that state outcomes are correlated, that the number of undecideds suggested uncertainty, and that Trump was only a standard polling error away from winning - all were far more prescient than the other forecasters’ thinking.