The fundamental claim of prediction markets is that they produce accurate probability estimates by aggregating diverse information through financial incentives. But how accurate are they really? This guide examines the evidence — academic research, scoring metrics, real-world track records, and the conditions under which markets fail.
The Theoretical Foundation: Wisdom of Crowds
The accuracy of prediction markets rests on a theory popularized by James Surowiecki in his 2004 book The Wisdom of Crowds. The core idea: when a diverse group of people independently estimates a quantity, the average of their estimates tends to be more accurate than most individual estimates — including those of experts.
Prediction markets formalize this by attaching financial stakes. In a standard stock market, the efficient market hypothesis holds that prices reflect all available information because traders with superior information profit by trading on it. The same logic applies to prediction markets: if a contract is underpriced relative to the true probability, informed traders profit by buying, pushing the price toward the correct value.
Friedrich Hayek's 1945 insight about markets as information aggregation mechanisms applies directly. No single person knows everything relevant to a forecast, but prices synthesize dispersed knowledge — the political operative who knows turnout patterns, the economist who spotted a leading indicator, the local journalist who detected a shift in sentiment.
Measuring Accuracy: Brier Scores
The most widely used metric for evaluating probabilistic forecasts is the Brier score, developed by Glenn Brier in 1950. The Brier score is calculated as the mean squared error between predicted probabilities and actual outcomes:
Brier Score = (1/N) * sum of (forecast - outcome)^2
Where the outcome is 1 (event happened) or 0 (event did not happen). Key reference points:
- 0.000: Perfect forecasting — every probability estimate exactly matched the outcome.
- 0.250: Random guessing — equivalent to assigning 50% probability to every binary event.
- 1.000: Perfectly wrong — assigning 100% probability to events that never happen.
Platform Brier Scores:
- Metaculus: Approximately 0.111 across its question database. This represents substantially better-than-chance forecasting with strong calibration.
- Manifold Markets: Approximately 0.168 across resolved markets. Good performance, though slightly weaker than Metaculus, likely reflecting its broader question range and more casual user base.
- Good Judgment Open: Approximately 0.149 for its aggregated forecasts, driven by trained superforecasters.
For context, professional meteorologists achieve Brier scores around 0.05-0.10 for short-range weather forecasting, which has decades of data and well-understood physical models. Prediction market scores of 0.11-0.17 for political, economic, and social events — where the underlying dynamics are far less predictable — represent impressive performance.
Academic Research
Wolfers and Zitzewitz (2004)
Justin Wolfers and Eric Zitzewitz published one of the foundational papers on prediction market accuracy, "Prediction Markets" in the Journal of Economic Perspectives. Their analysis concluded that prediction market prices were well-calibrated — events priced at 80% happened about 80% of the time — and that markets outperformed polls for election forecasting. They also documented that markets efficiently incorporated new information, with prices adjusting within minutes of major events.
Berg, Nelson, and Rietz (2008)
This study systematically compared Iowa Electronic Markets forecasts to polling averages across multiple US presidential elections. They found that IEM forecasts were more accurate than polls 74% of the time when compared on the same dates. The advantage was largest when forecasts were made well before election day — markets incorporated information about campaigns, candidates, and conditions that polls were slow to capture.
Arrow et al. (2008)
A remarkable letter signed by economists Kenneth Arrow, Robert Forsythe, Michael Gorham, Robert Hahn, Robin Hanson, John Ledyard, Saul Levmore, Robert Litan, Paul Milgrom, Forrest Nelson, George Neumann, Marco Ottaviani, Thomas Schelling, Robert Shiller, Vernon Smith, Erik Snowberg, Cass Sunstein, Richard Thaler, Hal Varian, Justin Wolfers, and Eric Zitzewitz called for reducing regulatory barriers on prediction markets. The letter argued that the evidence for prediction market accuracy was strong enough to justify policy support for their development.
Tetlock and the Good Judgment Project
Philip Tetlock's research, documented in Superforecasting (2015), compared various forecasting methods head-to-head. The IARPA-funded Good Judgment Project found that the best individual forecasters ("superforecasters") performed comparably to prediction markets, with both dramatically outperforming typical domain experts.
Tetlock's data showed that domain experts — political scientists predicting elections, economists predicting recessions — performed only marginally better than chance when making probability estimates. Prediction markets and trained superforecasters, by contrast, showed genuine skill. The implication is clear: credentials and subject-matter expertise alone do not produce accurate forecasts. The information aggregation mechanism matters enormously.
Atanasov et al. (2017)
This study from the University of Pennsylvania compared prediction market forecasts to those from surveys, polls, and expert panels across 500+ geopolitical forecasting questions. Prediction markets outperformed all other methods, with the strongest advantage on questions where diverse information sources were relevant and the market had adequate liquidity.
Real-World Track Record
US Elections
The 2024 US presidential election was the most dramatic validation of prediction market accuracy. In the final weeks before the election:
- Polymarket showed Trump at approximately 60-65% probability.
- Kalshi showed similar odds, with Trump favored.
- Major polling averages (RealClearPolitics, FiveThirtyEight, The Economist) showed a near toss-up, with most giving neither candidate more than a 55% chance.
Trump won decisively, carrying all seven swing states. The prediction market consensus was materially closer to the actual outcome than the polling consensus.
This was not a one-off. Markets also outperformed polls in 2016, when most models gave Clinton an 85-95% chance of winning while markets showed a tighter race (though markets still had Clinton favored, they correctly indicated significantly more uncertainty). In the 2022 midterms, market-implied probabilities for the Senate were more accurate than most polling-based models.
Federal Reserve Decisions
Prediction markets on Fed interest rate decisions have an excellent track record. The CME FedWatch tool (based on federal funds futures, which are functionally prediction markets) and Kalshi's Fed markets typically converge on the correct decision days before the official announcement.
In 2023 and 2024, the markets correctly anticipated every Fed rate decision — including both the timing of pauses and the beginning of the cutting cycle. The markets are not perfect on timing, but they are well-calibrated: when markets price a 90% chance of a rate cut, the cut happens approximately 90% of the time.
COVID-19 Timelines
During the COVID-19 pandemic, prediction markets on Metaculus and other platforms produced useful forecasts for vaccine development timelines, case counts, and policy decisions. A Metaculus analysis found that its community forecasts for vaccine approval timelines were significantly more accurate than most public health expert predictions, which tended to be too pessimistic about the speed of development.
Economic Indicators
Markets on economic data releases (GDP growth, employment numbers, inflation) tend to track closely to the eventual outcome, though they rarely outperform the consensus of professional economic forecasters (the "Bloomberg consensus"). This is expected: for major economic indicators, professional forecasters already incorporate diverse information effectively, leaving less room for prediction markets to add value.
Calibration Analysis
Calibration is the most important dimension of forecast accuracy for prediction markets. A well-calibrated market is one where events priced at X% actually occur X% of the time.
Calibration analysis of major prediction market platforms reveals several patterns:
Generally well-calibrated in the 20-80% range. Events priced at 30% happen roughly 30% of the time, events priced at 60% happen roughly 60% of the time. The core of the probability distribution is where markets perform best.
Slight overconfidence at extremes. Events priced at $0.90 (90% probability) tend to happen slightly less often than 90% of the time — perhaps 85-87%. Similarly, events priced at $0.10 tend to happen slightly more often than 10% of the time. This "extremeness aversion" is a well-documented phenomenon and is likely caused by a combination of transaction costs and the bounded nature of prediction market prices.
Better calibration with more liquidity. High-volume markets (US presidential elections, Fed decisions) are better calibrated than low-volume markets (niche political races, entertainment). This is expected: more traders means more information and faster price correction.
Improving over time. As prediction markets have grown in participation and sophistication, calibration has improved. The 2024-2025 data shows better calibration than 2020-2021 data for comparable event types, likely reflecting a larger and more experienced trader base.
When Markets Fail
Despite their strong overall track record, prediction markets are not infallible. Understanding their failure modes is essential for interpreting prices correctly.
Thin Markets
The single biggest factor in market inaccuracy is low liquidity. A market with $2,000 in total volume and 15 active traders is not a reliable probability estimate — it is a small group's opinion. The wisdom-of-crowds effect requires a large, diverse crowd. When markets lack sufficient participation, prices can be noisy, manipulated, or simply wrong.
Rule of thumb: Be skeptical of market probabilities when total volume is below $50,000 or when fewer than 100 unique traders have participated.
Manipulation
In 2024, there was significant debate about whether large traders were manipulating Polymarket election prices. While the most prominent example (the "Theo" account that placed $30 million on Trump) turned out to be correct, the broader question remains valid. In low-liquidity markets, a single well-funded trader can move prices substantially. Whether this constitutes "manipulation" or "informed trading" depends on whether the trader has genuine information or is simply trying to influence the perception of probabilities.
Research suggests that manipulation attempts in prediction markets are generally short-lived. If a manipulator pushes a price to an unrealistic level, informed traders profit by taking the other side, correcting the distortion. However, the correction process requires sufficient informed participants with capital to deploy — which brings us back to the liquidity requirement.
Novel and Unprecedented Events
Markets struggle with truly novel events — questions where there is no historical base rate and where existing mental models may not apply. Forecasting the probability of a global pandemic in January 2020, the likelihood of a specific AI capability milestone, or the chance of a previously unseen geopolitical event pushes markets into territory where the crowd's wisdom is limited by the crowd's experience.
Motivated Reasoning
Political prediction markets can be distorted by partisan trading — participants buying contracts on their preferred candidate not because they believe the probability is mispriced, but because they want to express support or are biased in their assessment. Research suggests this effect is real but limited: while some traders are politically motivated, profit-seeking traders correct these biases over time.
The Bottom Line
Prediction markets are not perfect, but the evidence strongly supports their effectiveness as forecasting tools. They outperform polls for elections, match or exceed expert forecasts for geopolitical and economic events, and produce well-calibrated probability estimates when they have sufficient liquidity.
The key takeaways for interpreting prediction market probabilities:
- Trust high-volume markets. Markets with millions in volume on major platforms are the most reliable signals.
- Be skeptical of thin markets. Low-volume, low-participation markets are weak signals at best.
- Check calibration, not just the final outcome. A market that assigns 70% to an event that does not happen was not "wrong" — it was correct 70% probability estimation if, across many similar predictions, the events occurred 70% of the time.
- Compare across platforms. When Kalshi, Polymarket, and other platforms agree, the signal is robust. When they diverge, investigate why.
- Combine with other sources. Prediction markets are most powerful when used alongside polling data, expert analysis, and statistical models — not as a replacement for them.

