Is trading volume on prediction markets dominated by thoughtful traders or gamblers? It only takes one thoughtless gambler with a large bankroll to outweigh a bunch of high signal traders. Or a bunch of low signal traders with a collectively large bankroll to outweigh a high signal trader.
Based on what I see in the comments on the two sites, Metaculus is clearly higher signal regarding hantavirus. And thats quite counterintuitive if you expect polymarket’s got lots of thoughtful traders involved. Shouldn‘t the comments on Polymarket be full of persuasive arguments by traders trying to convince others to come around to their position and increase the values of the shares they own? Why do we see this only on Metaculus, where the only reward is internet points?
It only takes one thoughtless gambler with a large bankroll to outweigh a bunch of high signal traders. Or a bunch of low signal traders with a collectively large bankroll to outweigh a high signal trader.
yes—but one of the nice fundamental properties of prediction markets is that over time, thoughtful people accumulate larger bankrolls
and yes, I think Metaculus comments are generally quite good; Manifold’s are sometimes good, Polymarket/Kalshi are approximately garbage. This is I think partly cultural effects (and product decisions) about who comments vs trades and how those get represented, but yes also reflects something important about the distribution of the underlying audience—that Metaculus has a handful of extremely thoughtful forecasters, polymarket may have that plus a thousand degen gamblers. my contention again is that the structure of markets happily means that the latter can still be quite accurate.
I don’t know if you’ve seen https://brier.fyi/ (and, imo their results should be taken with a grain of salt, though also I might just be salty); but my main takeaway is that they’re all pretty calibrated, and broadly could be cited much more (whether market or poll)
yes—but one of the nice fundamental properties of prediction markets is that over time, thoughtful people accumulate larger bankrolls
It’s true that superforecasters outperform on forecasting in general. But there are several major issues with the assumption that their increased bankroll can be relied upon to improve the signal. However:
Superforecasters are a small population with a finite amount of time and capital. The number of questions and overall volume on prediction markets may easily grow faster than the number of superforecasters and their bankroll, resulting in a trend toward less accurate markets over time as the superforecasters are spread thin.
Superforecasters robustly outperform across many questions, but may underperform those with idiosyncratic, topic-specific expertise on particular classes of questions. Even if subject-matter experts could make a lot of money on prediction markets where their expertise gives them a particular edge, we may not in fact see sufficient participation by those experts to make up for the low-signal betting. And because their expertise is relevant only occasionally, they may not accumulate enough of a bankroll to outweigh the noise, even when their expertise is relevant.
The requirement to overcome hurdles to participate in prediction markets means that even the most thougthful participants in the market are selected from a smaller talent pool biased in favor of the kind of people drawn to prediction markets. That biasing effect may mean that the asymptote of accuracy that prediction markets would approach over time, counting on their most thougthful participants to outweight the low-signal traders, may nevertheless be at a lower asymptote of accuracy than the most thoughtful forecasters in other domains (Metaculus, traditional expert institutions, etc).
I don’t know if you’ve seen https://brier.fyi/ (and, imo their results should be taken with a grain of salt, though also I might just be salty); but my main takeaway is that they’re all pretty calibrated, and broadly could be cited much more (whether market or poll)
Interestingly, my qualitative impression scrolling through the complete list of questions and the midpoint Brier scores are that the highest volume questions have the worst midpoint Brier scores.
I’m not clear on how they handle the aggregation when only a subset of platforms are involved in a linked market (a lot of the markets are only Manifold vs. Metaculus, for example). I don’t know their inclusion criteria for these linked markets, and I also don’t know how the selection effects involved in excluding questions that only one market operated should impact our interpretation of their findings.
Then there’s the problem of grading on the “midpoint Brier score.” This leads to the appearance of high calibration on questions that resolved, or were clearly on track to resolving, prior to the midpoint, either by chance or due to the lack of incentive to buy and hold extremely high-probability outcomes. Hence, sorting by midpoint Brier score, we get calibration scores of “A” on events like “will China launch a full-scale invasion of Taiwan in 2023” (near-0% probability throughout the question) or “will Sam Altman Return to OpenAI” (leapt up to near 100 prior to the resolution midway point. We also have a huge enrichment of questions about political victories. I’m not familiar enough with these candidates, but I’d speculate they were just safe seats. Questions with “F” calibration often involve highly unpredictable events: sports victories, movie awards, executive and regulatory decisions.
No clear takeaways here, I am just confused about their question selection and grading methodology and just don’t feel confident about how to interpret the blog post.
(thanks)
*one thoughtful trader with a large bankroll to outweigh a bunch of low signal traders.
(granted, at the tails, it does become more expensive to further lower the odds. Eg at 5%, you’re paying $19 per $1 downwards)
But that’s the empirical question, right?
Is trading volume on prediction markets dominated by thoughtful traders or gamblers? It only takes one thoughtless gambler with a large bankroll to outweigh a bunch of high signal traders. Or a bunch of low signal traders with a collectively large bankroll to outweigh a high signal trader.
Based on what I see in the comments on the two sites, Metaculus is clearly higher signal regarding hantavirus. And thats quite counterintuitive if you expect polymarket’s got lots of thoughtful traders involved. Shouldn‘t the comments on Polymarket be full of persuasive arguments by traders trying to convince others to come around to their position and increase the values of the shares they own? Why do we see this only on Metaculus, where the only reward is internet points?
yes—but one of the nice fundamental properties of prediction markets is that over time, thoughtful people accumulate larger bankrolls
and yes, I think Metaculus comments are generally quite good; Manifold’s are sometimes good, Polymarket/Kalshi are approximately garbage. This is I think partly cultural effects (and product decisions) about who comments vs trades and how those get represented, but yes also reflects something important about the distribution of the underlying audience—that Metaculus has a handful of extremely thoughtful forecasters, polymarket may have that plus a thousand degen gamblers. my contention again is that the structure of markets happily means that the latter can still be quite accurate.
I don’t know if you’ve seen https://brier.fyi/ (and, imo their results should be taken with a grain of salt, though also I might just be salty); but my main takeaway is that they’re all pretty calibrated, and broadly could be cited much more (whether market or poll)
It’s true that superforecasters outperform on forecasting in general. But there are several major issues with the assumption that their increased bankroll can be relied upon to improve the signal. However:
Superforecasters are a small population with a finite amount of time and capital. The number of questions and overall volume on prediction markets may easily grow faster than the number of superforecasters and their bankroll, resulting in a trend toward less accurate markets over time as the superforecasters are spread thin.
Superforecasters robustly outperform across many questions, but may underperform those with idiosyncratic, topic-specific expertise on particular classes of questions. Even if subject-matter experts could make a lot of money on prediction markets where their expertise gives them a particular edge, we may not in fact see sufficient participation by those experts to make up for the low-signal betting. And because their expertise is relevant only occasionally, they may not accumulate enough of a bankroll to outweigh the noise, even when their expertise is relevant.
The requirement to overcome hurdles to participate in prediction markets means that even the most thougthful participants in the market are selected from a smaller talent pool biased in favor of the kind of people drawn to prediction markets. That biasing effect may mean that the asymptote of accuracy that prediction markets would approach over time, counting on their most thougthful participants to outweight the low-signal traders, may nevertheless be at a lower asymptote of accuracy than the most thoughtful forecasters in other domains (Metaculus, traditional expert institutions, etc).
Interestingly, my qualitative impression scrolling through the complete list of questions and the midpoint Brier scores are that the highest volume questions have the worst midpoint Brier scores.
I’m not clear on how they handle the aggregation when only a subset of platforms are involved in a linked market (a lot of the markets are only Manifold vs. Metaculus, for example). I don’t know their inclusion criteria for these linked markets, and I also don’t know how the selection effects involved in excluding questions that only one market operated should impact our interpretation of their findings.
Then there’s the problem of grading on the “midpoint Brier score.” This leads to the appearance of high calibration on questions that resolved, or were clearly on track to resolving, prior to the midpoint, either by chance or due to the lack of incentive to buy and hold extremely high-probability outcomes. Hence, sorting by midpoint Brier score, we get calibration scores of “A” on events like “will China launch a full-scale invasion of Taiwan in 2023” (near-0% probability throughout the question) or “will Sam Altman Return to OpenAI” (leapt up to near 100 prior to the resolution midway point. We also have a huge enrichment of questions about political victories. I’m not familiar enough with these candidates, but I’d speculate they were just safe seats. Questions with “F” calibration often involve highly unpredictable events: sports victories, movie awards, executive and regulatory decisions.
No clear takeaways here, I am just confused about their question selection and grading methodology and just don’t feel confident about how to interpret the blog post.