Friday, November 9, 2012

Nate Silver and 538


Before I go any further, let's get one thing out of the way. Nate Silver is a genius. He is skilled not only in manipulating data but also in providing explanations for how he manipulated data. Silver gets plenty of credit for the first thing and way too little credit for the second thing.

My little niche in the world has been carved writing about fantasy baseball and applying math to player valuation. By trade, I've spent the last few years of my career applying math and statistics to health care analytics. Using statistics properly is difficult enough. Using statistics well and explaining to someone how you got there is climbing Mount Everest. Silver's 538 blog not only climbs that mountain, it reaches the summit.

The knee-jerk reaction to Tuesday’s election results was to praise Silver as the Dean Emeritus of the Electoral College. Some were amazed that Silver nailed 49 out of 49 states, and will probably get Florida right as well.

While there's no doubt that Silver did a terrific job, the surprised reaction of the masses came because of some last second projections from some far less analytical sources. Conor Friedersdorf of The Atlantic has a good piece on this phenomenon. To some on the right it was difficult to believe that Barack Obama could possibly clear 250 electoral votes, let alone win an election. In reality, the opposite was the case. All along, Mitt Romney had a difficult path to the Presidency where the electoral votes were concerned.

To be fair, there were voices on the right that either thought Obama was the favorite or that Romney was only a slight favorite who didn't have a realistic path to 300 electoral votes. Ted Frank at Point of Law posted a very well thought out and rational piece a week before the election explaining his issues with Silver's methodology. While I don't agree with many of Frank's conclusions, unlike a lot of those on the right Frank offered a thoughtful rationale that went beyond "well it can't possibly be correct that Obama is winning by that much."

The right also seemed to get stuck on the idea that because Silver’s model gave Obama a 91% chance of winning that Obama was going to win in a landslide. But this wasn't what Silver was saying at all. Rather, his system was estimating the probability of an Obama victory at 91%. That nine percent chance for Mitt Romney wasn't trivial. To use an apt analogy that Bret Sayre of Fake Teams provided on Twitter, a home team in baseball with a one-run lead, a runner on third base, and two outs in the top of the ninth has an 87% chance of winning. The game is close, but the odds of the road team winning are poor.

The most significant problem with the right's analysis is that on a basic level using state polling data works. Nate Silver isn't the only analyst that uses state polling data to try and predict the outcome of elections. Election Projection and Electoral Vote both rely heavily on state projections. Unlike Silver, they don't shake and bake with the numbers. Both sites aggregate state polls going back about a week. Both sites predicted an Obama victory, 303-235. The only state both sites got wrong was Florida, which Silver (probably) got right.

As a fun exercise, I went back even further than Election Projection and Electoral Vote and calculated a leader based on a month's worth of polls, using only one poll (the most recent one) per polling firm. This methodology is generally considered a poor one because older polls are generally considered "stale" and don't take into account recent events that could tip the scales in one direction or the other. How does this "neutral" model do compared to Silver's more rigorous model?

Nate Silver's 538 model versus "neutral" state poll model
State
Raw Polling
Average
Silver/538
Actual
California
Obama +14
Obama +17.4
Obama +20.5
Colorado
Obama +1.1
Obama +2.5
Obama +4.7
Connecticut
Obama +12.7
Obama +14.1
Obama +17.8
Florida
Romney +1.3
Obama 0
Obama +0.6
Iowa
Obama +2
Obama +3.2
Obama +5.6
Massachusetts
Obama +19.7
Obama +19.1
Obama +22.8
Michigan
Obama +3.6
Obama +7.1
Obama +8.5
Minnesota
Obama +5.7
Obama +8.6
Obama +6.4
Missouri
Romney +10.8
Romney +8.1
Romney +9.6
Nevada
Obama +2.6
Obama +4.5
Obama +6.6
New Hampshire
Obama +1.8
Obama +3.5
Obama +5.8
North Carolina
Romney +2.7
Romney +1.7
Romney +2.2
Ohio
Obama +2.8
Obama +3.6
Obama +1.9
Pennsylvania
Obama +3.8
Obama +5.9
Obama +5.2
Virginia
Obama +0.2
Obama +2
Obama +3
Washington
Obama +13.6
Obama +13.6
Obama +13.3
Wisconsin
Obama +4.7
Obama +5.5
Obama +6.7
National
Obama +1.7
Obama +2.5
Obama +2.5

I included every state that had a minimum of five or more polls listed at Real Clear Politics in the 30 days leading up to the election.

While the right kept vaguely and not-so-vaguely accusing Silver of "cooking the books" in the days leading up to the election, Silver's model may not have been aggressive enough. In 11 of the 15 states where Silver had Obama winning, the actual margin was even higher than Silver predicted. The raw polling model was even more tepid on Obama than Silver's model, lagging behind him in 13 of 15 states.

Silver's model worked very well. But we shouldn't be too surprised. State polling-based models have worked very well in the last three election cycles. If you used raw polling averages in 2004 and 2008, you would have correctly predicted the winner in 49 out of 50 states both times, with only Indiana (2008) and Ohio (2004) being off the mark. If you used a shorter time frame in 2004 than 30 days, you would have predicted a George Bush victory; John Kerry was only ahead in one poll in the last few days leading up to that election.

The "surprise" of Silver doing so well with his model has more to do with a lack of belief in data than in anything that Silver is or isn't doing. While the right's apoplectic reaction has magnified this perception, Presidential election coverage in general has always tried to present the state of the race as something that is difficult to get a feel for and could go in any direction. While this is theoretically true, we now have three Presidential election cycles where state polling has offered us an excellent baseline to predict what will ultimately happen on Election Day.