A Prediction Market Trading Pit for the Digital Age: A Guest Post From David Pennock

Photo: iStockphoto Collection

David Pennock is one of the smartest guys I know.  As a scientist at Yahoo! Research, he’s on the bleeding edge of computer scientists working at the interface with economics.  His latest project, called Predictalot, is an amazing new prediction market which allows people to trade on the millions of possible outcomes of the Sweet Sixteen.  It’s a brilliant example of just why economists are going to have to get cozy with computer scientists.  And David has generously agreed to provide a guest post describing what he’s up to.  (And if you want more, he writes the always-interesting Oddhead Blog).

A Prediction Market Trading Pit for the Digital Age
By David Pennock

Prediction markets are a boon for information junkies. You can learn a lot by watching people vote with their wallets, even without opening your own. For example, here’s what the betting markets are predicting today:

  • There’s a 96-98 percent chance the Federal Reserve will keep rates unchanged at their April meeting (IEM)
  • There’s a 42-47 percent chance that 2012 will be the warmest year on record (intrade)
  • U.S. home prices aren’t expected to rebound until 2012, though they’re projected to be up 1-5 percent from Dec 2009 levels by 2015 (CME Group)
  • There’s a less than 50 percent chance Obama will win Ohio in 2012 (intrade)

An information junkie myself, I want more. For example, I’d like to know:

  • The chances that Obama will win both Ohio and Pennsylvania together
  • The chances that whoever wins California will win the election
  • The chances that the Republican candidate will win Nevada, Utah, and Arizona but not New Mexico

To get this kind of wild flexibility in what can be predicted, we need what are called combinatorial prediction markets, or markets where predictions are composed by combining different options in myriad ways.

Most prediction markets are one-dimensional, which means that every outcome is traded separately.  For instance, to predict “Obama will win between 269 and 312 electoral votes” in a one-dimensional market, you’d have to go in and buy each of the 44 intervening outcomes one at a time: 269, 270, 271, 272, … ugh! Combinatorial prediction markets allow traders to buy the full interval in one fell swoop. This bundling feature can be added with almost no downside as long as the number of outcomes is modest: say a few thousand or less.

But combinatorial prediction markets can have an unimaginably large number of outcomes. A U.S. election market might have over one quadrillion outcomes, one for every possible way the fifty states might swing. Think of the red and blue maps that are typically displayed by election pundits.  Now think of all the possible configurations of that map that might arise after the 2012 election. Since each of the fifty states can be colored two ways (ignoring third parties), in theory there are 2 to the power 50, or 1.13 quadrillion, possible outcomes! A combinatorial prediction market needs to track each of these possibilities!

People don’t naturally deal well with 1.13 quadrillion of anything. Traders want to predict high-level things like “Obama will win Ohio and Pennsylvania.” A combinatorial market does the heavy lifting behind the scenes, taking a prediction and automatically breaking it up into its component parts: in this case, all possible maps — over 250 trillion — in which Ohio and Pennsylvania are both blue. Of course, this process is an impossibly tedious task no (human) trader would ever undertake. Luckily, computers don’t mind at all; in fact, it’s just the kind of thing they excel at.

These combinatoric problems are actually pretty common.  In fact, you’ve probably been thinking about a particularly complicated example recently, as you’ve filled out your March Madness brackets.  There are 9.2 quintillion possible brackets, one for every possible way the 64-team, 63-game tournament might unfold.  To put that number in perspective, there are estimated to be about 10 quintillion insects on the planet.  At Yahoo! Labs, we’ve built a combinatorial prediction market as a beta experiment. It’s called Predictalot, and right now you can make any of millions (OK, quadrillions) of predictions, including “Duke will advance further than Connecticut and Brigham Young” (current odds: 42.3 percent).

But there’s a real computational problem in running these markets: keeping track of 9.2 quintillion possible outcomes is too hard, even for today’s computers to manage explicitly. Technically, the problem we’re trying to solve is #P-hard, or as hard or harder than the canonical intractable problems in computer science like the traveling salesman problem. So we use an approximation technique to estimate the odds for any prediction a user selects on the fly. Improving this approximation is an ongoing area of research that we’re still actively exploring. (At this point, I have to acknowledge the incredibly talented and dedicated research engineers who took the crazy idea of two scientists — myself and Daniel Reeves — and turned it into something real that’s fast, fun, pretty, and easy to use. Read more here.)

Why do we need or want combinatorial markets? Simply put, they allow us to collect more information. Combinatorial markets reveal the correlations among events (like Obama winning both Ohio and Pennsylvania), and not just their independent likelihoods. Understanding these correlations is key to many applications, including risk assessment. In fact, many people conjecture the financial crisis was exacerbated due to fundamental underestimation of the possibility of correlated failures.

Now these ideas aren’t just relevant to prediction markets.  They also translate to financial and betting exchanges, sports bookies, and racetracks.  But while these markets are modernizing — turning their operations over to computers and moving online — their core logic for processing orders hasn’t changed much in the last century since the days when auctioneers were people. These markets typically treat all outcomes like apples and oranges, processing them independently, even when they are related. For example, bets on a horse “to win” and “to finish in the top two” are managed separately at the racetrack, as are options to buy a stock at strike price 30 and strike price 20 on the Chicago Board Options Exchange. In both cases, it’s a logical truism that the first is worth less than the second, yet the market pleads ignorance, leaving it to traders to enforce consistent pricing. In a combinatorial market, a bet on Obama to win Ohio and Florida automatically affects the market price for that combination, and also for the possibility that he wins the Presidency, as it logically should.

A combinatorial market is a smarter market, letting humans and computers each do what they do best. People enter predictions in simple terms they understand. The computer handles the massive yet methodical number-crunching needed to combine all the pieces together into a coherent overall prediction of a complex event. Especially in the context of a prediction market, where the goal is to gather information, it makes sense to focus traders on providing their information, rather than wasting effort on finding and exploiting mispricings between related outcomes.

The learning curve in many of these prediction markets is still too steep. First-time traders can get lost in a maze of numbers, jargon, and definitions. By shifting some of that complexity into the central trading pit, the task of traders can, somewhat counter-intuitively, become easier and more natural, leveling the playing field and allowing a wider range of people to participate. Ultimately, that’s good for the overall ecosystem. And great for information junkies.


Allen Reynolds

Interesting, but still prediction markets are not odds any more than futures markets are predictions. That there is a less than 50% chance Obama will win Ohio is not a prediction. It is current midpoint in a very thinly traded bid/ask where participants have motivations beyond pure speculation. For example, I might be long Obama because I want to hedge my tax risk. Or short because I just want to see analysts use this price as a "prediction".


theory says there still should be some information contained in the markets. Your right that thin markets might have a very poor signal to noise ratio. And markets where people have a strong emotional reaction to outcomes such as politics and sports are probably noisier than most.

The interesting thing about this work is that it could be a way to thicken the market considerably. Each very specific bet contains information about a multitude of simpler bets that might, or might not, lead to better performance overall.


Very interesting. I'm curious, is the point system a zero sum exchange? It's not transparent how you make the market liquid for extremely specific bets, but maybe there's a way to make everything work out in the extremely high dimensional space you guys have.

I couldn't find anything about it looking through about 3 layers of info you guys have on it, but please let me know.


The California question seems dumb. Obama will win California unless he is replaced by another democrat. Therefore the question should just be will Obama win? Since Democrats have such an entrenched lead in the state.

Greg Z

Interestingly, the biggest betting exchange operates single-events bets, multiple event bets which are an "independent" market, but also offers multiples bets based on single bets (on relatively uncorrelated events) which they try to hedge by taking positions on singles markets.

David Pennock

Allen: You're right that technically what is revealed is the crowd's "risk-neutral" probability. But also long as there are some reasonably deep-pocketed speculators, they can earn an expected profit off of people hedging, so prices should converge toward true probabilities.

KevinH: you're exactly right that allowing bundle orders is a way to thicken the market. The market maker we are using for Predictalot is a positive-sum game: it's GMU economist Robin Hanson's "logarithmic market scoring rule" market maker. Except we have to approximate it since we can't compute it exactly over 9.2 quintillion states.

Mike: You're right that Obama has a very high chance of winning California, but it's probably not 100%. What if a moderate Republican from California is nominated? He or she might have a shot. Or other events we can't foresee.

Greg Z: are you referring to Betfair? Yes, betfair, intrade, IEM, CBOE, etc., treat each outcome in a multiple outcome bet as it's own independent market. There have been some exceptions over the years (e.g., Longitude, now defunct) but none of the big exchanges allow bundle bets in correlated events as far as I know. See http://bit.ly/multipm



Not so fast on assuming the 2012 presidential election results can be predicted as usual.

By 2012, The National Popular Vote bill could guarantee the Presidency to the candidate who receives the most popular votes in all 50 states (and DC).

Every vote, everywhere would be politically relevant and equal in presidential elections. Elections wouldn't be about winning states. Every vote, everywhere would be counted for and directly assist the candidate for whom it was cast. Candidates would need to care about voters across the nation, not just undecided voters in a handful of swing states.

In the 2012 election, pundits and campaign operatives already agree that only 14 states and their voters will matter under the current winner-take-all laws (i.e., awarding all of a state’s electoral votes to the candidate who receives the most popular votes in each state) used by 48 of the 50 states. Candidates will not care about 72% of the voters- voters-in 19 of the 22 lowest population and medium-small states, and big states like CA, GA, NY, and TX. 2012 campaigning would be even more obscenely exclusive than 2008 and 2004. Candidates have no reason to poll, visit, advertise, organize, campaign, or care about the voter concerns in the dozens of states where they are safely ahead or hopelessly behind. Policies important to the citizens of ‘flyover’ states are not as highly prioritized as policies important to ‘battleground’ states when it comes to governing.

The bill would take effect only when enacted, in identical form, by states possessing a majority of the electoral votes--enough electoral votes to elect a President (270 of 538). When the bill comes into effect, all the electoral votes from those states would be awarded to the presidential candidate who receives the most popular votes in all 50 states (and DC).

The bill uses the power given to each state by the Founding Fathers in the Constitution to change how they award their electoral votes for president. Historically, virtually all of the major changes in the method of electing the President, including ending the requirement that only men who owned substantial property could vote and 48 current state-by-state winner-take-all laws, have come about by state legislative action.

In Gallup polls since 1944, only about 20% of the public has supported the current system of awarding all of a state's electoral votes to the presidential candidate who receives the most votes in each separate state (with about 70% opposed and about 10% undecided). Support for a national popular vote is strong in virtually every state, partisan, and demographic group surveyed in recent polls in closely divided battleground states: CO - 68%, FL - 78%, IA 7-5%,, MI - 73%, MO - 70%, NH - 69%, NV - 72%, NM-- 76%, NC - 74%, OH - 70%, PA - 78%, VA - 74%, and WI - 71%; in smaller states (3 to 5 electoral votes): AK - 70%, DC - 76%, DE - 75%, ID - 77%, ME - 77%, MT - 72%, NE 74%, NH - 69%, NV - 72%, NM - 76%, OK - 81%, RI - 74%, SD - 71%, UT - 70%, VT - 75%, WV - 81%, and WY - 69%; in Southern and border states: AR - 80%,, KY- 80%, MS - 77%, MO - 70%, NC - 74%, OK - 81%, SC - 71%, VA - 74%, and WV - 81%; and in other states polled: CA - 70%, CT - 74%, MA - 73%, MN - 75%, NY - 79%, OR - 76%, and WA - 77%.

The bill has passed 31 state legislative chambers, in 21 small, medium-small, medium, and large states, including one house in AR, CT, DE, DC, ME, MI, NV, NM, NY, NC, and OR, and both houses in CA, CO, HI, IL, NJ, MD, MA, RI, VT, and WA. The bill has been enacted by DC, HI, IL, NJ, MD, MA, and WA. These 7 states possess 74 electoral votes — 27% of the 270 necessary to bring the law into effect.



David Pennock

I like this idea and had heard of it but didn't realize it had come so far -- thanks. Certainly that would change the election picture and reduce the interest/value in predicting individual states.

I was going to ask what is used for the ground truth measurement of the "most popular votes in all 50 states (and DC)", but I see it (and almost any other question one can think of) is answered on the website: http://www.nationalpopularvote.com/pages/answers/m20.php#m20_1


68* team tournament thanks to the new first four, usually not a big deal but with VCU, it kind of matters

David Pennock

Yes, you're right Kevin. We only modeled the "core 64" after the first four play-in games. Initially you could buy the combined team VCU/USC, then after the play-in game it became VCU.