Disagree. It’s possible to get a good calibration chart in unimpressive ways, but that’s not how Polymarket & Manifold got their calibration, so their calibration is impressive.
To elaborate: It’s possible to get a good calibration graph by only predicting “easy” questions (e.g. the p-weighted coin), or by predicting questions that are gameable if you ignore discernment (e.g. 1⁄32 for each team to win the Super Bowl), or with an iterative goodharting strategy (e.g. seeing that too many of your “20%” forecasts have happened so then predicting “20%” for some very unlikely things). But forecasting platforms haven’t been using these kinds of tricks, and aren’t designed to. They came by their calibration the hard way, while predicting a diverse set of substantive questions one at a time & aiming for discernment as well as calibration. That’s an accomplishment.
You skip over the not very impressive way for a prediction market platform to be calibrated that I already mentioned. If things predicted at 20% actuallt happen 30% of the time, you can buy up random markets that are at 20% and profit.
That seems like an instance of a general story for why markets are good: if something is priced too low people can buy it up and make a profit. It’s a not very impressive way for markets to be impressive.
If you’d said “not surprising” instead of “not impressive” then maybe I would’ve been on board. It’s not that surprising that prediction markets are good at calibration because we already knew that markets are good at that sort of thing. That seems basically true, for certain groups of “we”. Though my attitude is still more check it out: it works like we thought it would rather than nothing to see here, this is just what we expected.
What I’m going towards is, it seems to me the predictions given by the platform can be almost arbitrarily bad, but with some assumptions the above strategy will work and will make the platform calibrated. So calibration does not imply anything about goodness of predictions. So it’s not impressive.
Disagree. It’s possible to get a good calibration chart in unimpressive ways, but that’s not how Polymarket & Manifold got their calibration, so their calibration is impressive.
To elaborate: It’s possible to get a good calibration graph by only predicting “easy” questions (e.g. the p-weighted coin), or by predicting questions that are gameable if you ignore discernment (e.g. 1⁄32 for each team to win the Super Bowl), or with an iterative goodharting strategy (e.g. seeing that too many of your “20%” forecasts have happened so then predicting “20%” for some very unlikely things). But forecasting platforms haven’t been using these kinds of tricks, and aren’t designed to. They came by their calibration the hard way, while predicting a diverse set of substantive questions one at a time & aiming for discernment as well as calibration. That’s an accomplishment.
You skip over the not very impressive way for a prediction market platform to be calibrated that I already mentioned. If things predicted at 20% actuallt happen 30% of the time, you can buy up random markets that are at 20% and profit.
That seems like an instance of a general story for why markets are good: if something is priced too low people can buy it up and make a profit. It’s a not very impressive way for markets to be impressive.
If you’d said “not surprising” instead of “not impressive” then maybe I would’ve been on board. It’s not that surprising that prediction markets are good at calibration because we already knew that markets are good at that sort of thing. That seems basically true, for certain groups of “we”. Though my attitude is still more check it out: it works like we thought it would rather than nothing to see here, this is just what we expected.
What I’m going towards is, it seems to me the predictions given by the platform can be almost arbitrarily bad, but with some assumptions the above strategy will work and will make the platform calibrated. So calibration does not imply anything about goodness of predictions. So it’s not impressive.