I bookmarked this article on Slate last night because I found it intriguing: The polls of the future will be reproducible and open source. For a couple clips and a brief summary of poll aggregators, read on:

The model runs in R and Stan–R is a general program for statistical data manipulation and graphics; Stan is a specialized program that performs Bayesian inference, which is an approach, based on probability theory, for combining information from multiple sources. (Disclosure: I am one of the developers of Stan.)

There are several beautiful things about the open-source nature of this forecasting website. First, the analysis is fully reproducible: Anyone can run Kremp’s script, grab the same data that Kremp is using, and then run the R and Stan analyses to produce the estimates and graphs on the webpage.

Second, the result itself is inherently collaborative. Instead of being anyone’s personal model, it’s Pierre-Antoine Kremp’s implementation of Drew Linzer’s model, which itself is based on much earlier work in the statistics and political science literature. And if someone wants to critique the analysis, change the model, or add or remove data, he or she can directly do so. Anyone who has a problem with it can open up the model, copy it onto her own computer, make whatever changes she wants, and then post her own version of the model, her own forecasts, and her own graphs. There’s no gatekeeper: You can put out your data and assumptions, and anyone can then make judgments from there. Different people might favor different models, and that’s fine. For example, analysts make different choices about adjusting for party identification of survey respondents; to the extent this information is available, it can be incorporated into the model. By anyone.

(snip)

This is not to disparage the pioneering work of Nate Silver and others who are operating their own aggregators and forecasts. They’ve taken quantitative election forecasting out of the political science ghetto and made probabilistic thinking part of the general conversation. Open-source forecasting is just the natural next step, bringing the principles of reproducible research to our understanding of elections.

Open source approaches to science and so forth are pretty interesting in their own right. This is one approach to doing the same thing with polling.

As for the current go-to aggregators, I will continue to recommend them for now. My primary go-tos have been Nate Silver’s FiveThirtyEight blog and Sam Wang’s Princeton Election Consortium. Wang specifically refers to what he does as a form of meta-analysis, in which he’s able to offer an overall probabilistic summary based on all the available polling data. Silver is doing something relatively similar. What makes these models useful is that they enable us to ignore individual polls (much as a meta-analyst does not focus on individual research reports) and look at the bigger picture. That ABC poll that had Trump ahead yesterday that sent some here in a panic and the Stein folks shivering with anticipation? In Silver’s model, it barely makes a dent in the overall estimation of Clinton’s and Trump’s share of the popular vote. Silver noted the difference without the ABC tracking poll included among the polls aggregated was a .1% increase in Clinton’s estimated share of the popular vote and a .2% decrease in Trump’s estimated share. In other words, what we gain is a stable estimate of the truth.

These models are probabilistic because we know that polls include sampling error, as well as any of a number of other flaws methodologically. But we have some ability to assess what should be as close an approximation of the truth as is humanly possible. And these aggregators have a good track record generally. I’ll be curious to see how those going in a more open source direction work out, but to the extent that they encourage what scientists call replicability (the ability to independently reproduce and verify results based on the same openly available methodology), we as poll consumers will be better served. My advice for now: ignore the shiny objects and look at the aggregators – open source and otherwise.

0 0 votes
Article Rating