Catching up on news about the Good Judgment Project

Season 2 of the IARPA tournament has sped by so rapidly that we’ve been remiss about keeping readers abreast of news about the Good Judgment Project. Here are some highlights of the past several months.

Project co-leader Phil Tetlock is well known as the author of Expert Political Judgment, the popular 2005 book demonstrating that political experts often perform no better than chance when making long-term forecasts. Those who know Tetlock best in this context may be surprised by his assessment of the IARPA tournament in a December 2012 interview published at edge.org. There, he observed:

Is world politics like a poker game? This is what, in a sense, we are exploring in the IARPA forecasting tournament. You can make a good case that history is different and it poses unique challenges. This is an empirical question of whether people can learn to become better at these types of tasks. We now have a significant amount of evidence on this, and the evidence is that people can learn to become better. It’s a slow process. It requires a lot of hard work, but some of our forecasters have really risen to the challenge in a remarkable way and are generating forecasts that are far more accurate than I would have ever supposed possible from past research in this area.

Since that interview, the Good Judgment Team’s collective forecasts in the IARPA tournament have maintained a high standard of accuracy across topics ranging “Who will be the next Pope?” to “Will Iran and the U.S. commence official nuclear program talks before 1 April 2013?” to “Will 1 Euro buy less than $1.20 US dollars at any point before 1 January 2013?”. Impressively, the Team’s forecasters are not, for the most part, subject-matter-experts on these topics, but rather intelligent volunteers who research candidates for the papacy or Middle East politics in their “spare” time.

Our collective forecasts combine the insights of hundreds of forecasters using statistical algorithms that, ideally, help to extract the most accurate signal from the noise of conflicting predictions. Analyses of data from the first tournament season suggest that we can boost prediction accuracy by “transforming” or “extremizing” the group forecast. We described the process in an AAAI paper presented in Fall 2012:

Take the (possibly weighted) average of all the predictions to get a single probability estimate, and then transform this aggregate forecast away from 0.5….

Transformation increases the accuracy of aggregate forecasts in many, but not all cases. The trick is to find the transformation that will most improve accuracy in a particular situation. Preliminary results suggest that little or no transformation should be applied to the predictions of the most expert forecasters and to forecasts on questions with a high degree of inherent unpredictability. Stay tuned for further refinements as we analyze new data in Season 3.

This year, we’ve continued research comparing the accuracy of prediction markets to other methods of eliciting and aggregating forecasts, working with our new partner Lumenogic. Our Season 2 prediction-market forecasters use a Continuous Double Auction (“CDA”) trading platform similar to the now-defunct Intrade platform, but wagering only virtual dollars.

Forecasters have mixed feelings about trading as a method of prediction: some enjoy the challenge of predicting the ups-and-downs of forecaster sentiment; others prefer to focus on predicting the event in question and would rather not be bothered with understanding the views of their fellow traders. But, it’s hard to argue with the overall performance of the two markets over the first several months of forecasting. The prediction markets have outperformed the simple average of predictions from forecasters in each of our survey-based experimental conditions (individuals working without access to crowd-belief data and individuals working in small teams who have access to data about their teammates’ forecasts). Only our “super-forecasters” (drawn from the top 2% of all participants in Season 1) have proven to be consistently more accurate than the prediction markets.

During Season 3, which begins in June 2013, we’ll be expanding our research team and our forecaster pool in hopes of coming ever closer to the theoretical limits of forecasting accuracy. We invite our current forecasters to continue with the Good Judgment Team for another season and encourage new participants to join us for this exciting challenge.

Comments are closed.