I’m always interested in knowing why people disagree with me, but recognize that people have limited motivation to expend the effort to explain to me why I am wrong in a way I can understand.
In case that helps make it take less effort, I am permanently committed to Crocker’s rules.
Also, the LW user artifex0 is a different person.
Wait, unless I misunderstand you there’s a reasoning mistake here. You request epistemic credit for predicting implicitly that the Metaculus median was going to drop by five years at some point in the next three years. But that’s a prediction that the majority of Metaculites would also have made and it’s a given that it was going to happen, in an interval of time as long as three years. It’s a correct advance prediction, if you did make it (let’s assume so and not get into inferring implicit past predictions with retrospective text analysis), but it’s not one that is even slightly impressive at all.
As an example to explain why, I predict (with 80% probability) that there will be a five-year shortening in the median on the general AI question at some point in the next three years. And I also predict (with 85% probability) that there will be a five-year lengthening at some point in the next three years.
I’m predicting both that Metaculus timelines will shorten and that they will lengthen! What gives? Well, I’m predicting volatility… Should I be given much epistemic credit if I later turned out to be right on both predictions? No, it’s very predictable and you don’t need to be a good forecaster to anticipate it. If you think you should get some credit for your prediction, I should get much more from these two predictions. But it’s not the case that I should get much, nor that you should.
Are there inconsistencies in the AGI questions on Metaculus? Within the forecast timeline, with other questions, with the resolution criteria? Yes, there are plenty! Metaculus is full of glaring inconsistencies. The median on one question will contradict the median on another. An AI question with stronger operationalization will have a lower median than a question with weaker operationalization. The current median says there is a four percent chance that AGI was already developed. The resolution criteria on a question will say it can’t resolve at the upper bound and the median will have 14% for it resolving at the upper bound anyway.
It’s commendable to notice these inconsistencies and right to downgrade your opinion of Metaculus because of them. But it’s wrong to conclude (even with weak confidence), because you can observe such glaring inconsistencies frequently, and predict in advance that specific ones will happen, including changes over time in the median that are predictable even in expected value after accounting for skew, that you are a better forecaster on even just AGI questions (and the implicit claim of being “a slightly better Bayesian” actually seems far stronger and more general than that) than most of the Metaculites forecasting on these questions.
Why? Because Metaculites know there are glaring inconsistencies everywhere, they identify them often, they know that there are more, and they can find them, and fix most of them, easily. It’s not that you’re a better forecaster, just that you have unreasonable expectations of a community of forecasters who are almost all effectively unpaid volunteers.
It’s not surprising that the Metaculus median will change over time in specific and predictable ways that are inconsistent with good Bayesianism. That doesn’t mean they’re that bad: let us see you do better, after all. It’s because people’s energy and interest are scarce. The questions in tournaments with money prizes get more engagement, as do questions about things that are currently in the news. There are still glaring inconsistencies in these questions, because it’s still not enough engagement to fix them all. (Also because the tools are expensive in time to use for making and checking your distributions.)
There are only 601 forecasters who have more than 1000 points on Metaculus: that means only 601 forecasters who have done even a pretty basic amount of forecasting. One of the two forecasters with exactly 1000 points has made predictions on only six questions, for example. You can do that in less than one hour, so it’s really not a lot.
If 601 sounds like a lot, there are thousands of questions on the site, each one with a wall of text describing the background and the resolution criteria. Predictions need updated constantly! The most active predictors on the site burn out because it takes so much time.
It’s not reasonable to expect not to see inconsistencies, predictable changes in the median, and so on. It’s not that they’re bad forecasters. Of course you can do better on one or a few specific questions, but that doesn’t mean much. If you want even just a small but worthwhile amount of evidence, from correct advance predictions, that you are a better forecaster than other Metaculites, you need, for example, to go and win a tournament. One of the tournaments with money prizes that many people are participating in.
Evaluating forecasting track records in practice is hard and very dependent on the scoring rule you use (rankings for PredictionBook vary a lot with your methodology for evaluating relative performance, for example). You need a lot of data, and high quality, to get significant evidence. If you have low-quality data, and only a little, you just aren’t going to get a useful amount of evidence.