The two sides have different ideas of what it means to be epistemically virtuous.
Yudkowsky wants people to be good Bayesians, which e.g. means not over-updating on a single piece of evidence; or calibrating to the point that whatever news of new AI capabilities appears is already part of your model, so you don’t have to update again. It’s not so important to make publically legible forecasts; the important part is making decisions based on an accurate model of the world. See the LW Sequences, his career, etc.
The OP is part of the Metaculus community and expects people to be good… Metaculeans? That is, they must fulfill the requirements for “forecaster prestige” mentioned in the OP. Their forecasts must be pre-registered, unambiguous, numeric, and numerous.
So it both makes perfect sense for Yudkowsky to criticize Metaculus forecasts for being insufficiently Bayesian (it made little sense that a forecast would be this susceptible to a single piece of news; compare with the LW discussion here), and for OP to criticize Yudkowsky for being insufficiently Metaculean (he doesn’t have a huge public catalog of Metaculean predictions).
So far, so good. However, this post fails to make the case that being Metaculean is more epistemically virtuous than being Bayesian. All it does is illustrate that these are different ways to measure this virtue. And if someone else doesn’t accept your standards, it makes little sense to criticize them for not adhering to yours. It’s not like you’re adhering to theirs!
Metaculus
I do think the Metaculean ideal deserves some criticism of its own.
Metaculus rewards good public forecasts with play points rather than real-world outcomes, and (last I heard, at least) has some issues like demanding that users update their predictions all the time or lose points; or granting people extra points for making predictions, irrespective of the outcome.
And that’s without mentioning that what we actually care about (e.g. when will we get AGI) is sometimes highly dependent on fiddly resolution criteria, to the point that you can’t even interpret some Metaculus forecasts, or make meaningful predictions about them, without reading walls of text.
(And from personal experience with a related platform: I tried the play money prediction market Manifold Markets for a bit. I earned some extra play money via some easy predictions, but then lost interest once it appeared that I couldn’t cash out of my biggest bet due to lack of liquidity. So now all my play money is frozen for a year and I don’t use the platform anymore.)
All in all, making predictions on playpoint sites like Metaculus sounds like a ton of work for little reward. I guess OP is an attempt to make people use it via social shaming, but I doubt the efficacy of this strategy. If it’s so important that Yudkowsky make Metaculean predictions, have you considered offering bags of money for good forecasts instead?
Final Thoughts
Finally, I find it a bit weird that OP complains about criticism of Metaculus forecasts. The whole point of public forecasting, or so it seemed to me, is for there to be a public record which people can use as a resource, to learn from or to critique. And why would it be necessary for those critics to be Metaculus users themselves? Metaculus is a tiny community; most critics of their forecasts will not be community members; and to disregard those critics would be a loss for Metaculus, not for the critics themselves.
This isn’t a good description of being on Metaculus versus being a Bayesian.
How does one measure if they are “being a Bayesian”? The general point is you can’t, unless you are being scored. You find out by making forecasts—if you aren’t updating you get fewer points, or even lose points. Otherwise you have people who are just saying things that thematically sound Bayesian but don’t mean very much in terms of updated believes. Partly I’m making an epistemic claim that Eliezer can’t actually know if he’s being a good Bayesian, without proper forecasting. You can check out Tetlock’s work if you’re unsure why that would be the case, though I mention it in the post.
The more central epistemic claim I’m making in this essay: if someone says they are doing a better job of forecasting a topic than other people, but they aren’t actually placing forecasts so we could empirically test if they are, then that person’s forecasts should be held in high suspicion. I’m claiming this would be the same in every other domain, and AI timelines are unlikely to be that special, and his eminence doesn’t really buy a good justification why we would hold him to drastically lower standards about measuring his forecast accuracy.
How does one measure if they are “being a Bayesian”? The general point is you can’t, unless you are being scored. You find out by making forecasts—if you aren’t updating you get fewer points, or even lose points. Otherwise you have people who are just saying things that thematically sound Bayesian but don’t mean very much in terms of updated believes.
I understand that you value legibility extremely highly. But you don’t need a community or a scoring rule to assess your performance, you can just let reality do it instead. Surely Yudkowsky deserves a bajillion Bayes points for founding this community and being decades ahead on AI x-risk.
Bayes points may not be worth anything from the Metaculean point of view, but the way I understand you, you seem to be saying that forecasts are worth everything while ignoring actions entirely, which seems bizarre. That was the point in my original comment, that you two have entirely different standards. Yudkowsky isn’t trying to be a good Metaculean, so of course he doesn’t score highly from your point of view.
An analogy could be Elon Musk. He’s done great things that I personally am absolutely incapable of. And he does deserve praise for those things. And indeed, Eliezer was a big influence on me. But he gives extreme predictions that probably won’t age well.
Him starting this site and writing a million words about rationality is wonderful and outstanding. But do you think it predicts forecasting performance nearly as well as proper forecasting actual performance? I claim it doesn’t come anywhere near as good of a predictive factor than just making some actual forecasts and seeing what happens, and I don’t see the opposing position holding well at all. You can argue that “we care about other things too than just forecasting ability” but in this thread I am specifically referring to his implied forecasting accuracy, not his other accomplishments. The way you’re referring to Bayes points here doesn’t seem workable or coherent, any more than Musk Points tell me his predictions are accurate.
If people (like Musk) are continually successful, you know they’re doing something right. One-off sucess can be survivorship bias, but the odds of having continued success by mere happenstance get very low, very quickly.
When I call that “getting Bayes points”, what I mean is that if someone demonstrates good long-term decision-making, or gets good long-term outcomes, or arrives at an epistemic state more quickly, you know they’re doing some kind of implicit forecasting correctly, because long-term decisions in the present are evaluated by the reality of the future.
This whole discussion vaguely reminds me of conflicts between e.g. boxing and mixed martial arts (MMA) advocates: the former has more rules, while the latter is more flexible, so how can two competitors from their respective disciplines determine who of them is the better martial artist? They could both compete in boxing, or both compete in MMA. Or they could decide not to bother, and remain in their own arenas.
I guess it seems to you like Yudkowsky has encroached on the Metaculus arena but isn’t playing by the Metaculus rules?
No, success and fame are not very informative about forecasting accuracy. Yes they are strongly indicative of other competencies, but you shouldn’t mix those in with our measure of forecasting. And nebulous unscorable statements don’t at all work as “success”, too cherry-picked and unworkable. Musk is famously uncalibrated with famously bad timeline predictions in his domain! I don’t think you should be glossing over that in this context by saying “Well he’s successful...”
If we are talking about measuring forecasting performance, then it’s more like comparing tournament Karate with trench warfare.
If people (like Musk) are continually successful, you know they’re doing something right. One-off sucess can be survivorship bias, but the odds of having continued success by mere happenstance get very low, very quickly
Unless success breeds more success, irrespective of other factors.
Epistemic Virtue
Taking a stab at the crux of this post:
The two sides have different ideas of what it means to be epistemically virtuous.
Yudkowsky wants people to be good Bayesians, which e.g. means not over-updating on a single piece of evidence; or calibrating to the point that whatever news of new AI capabilities appears is already part of your model, so you don’t have to update again. It’s not so important to make publically legible forecasts; the important part is making decisions based on an accurate model of the world. See the LW Sequences, his career, etc.
The OP is part of the Metaculus community and expects people to be good… Metaculeans? That is, they must fulfill the requirements for “forecaster prestige” mentioned in the OP. Their forecasts must be pre-registered, unambiguous, numeric, and numerous.
So it both makes perfect sense for Yudkowsky to criticize Metaculus forecasts for being insufficiently Bayesian (it made little sense that a forecast would be this susceptible to a single piece of news; compare with the LW discussion here), and for OP to criticize Yudkowsky for being insufficiently Metaculean (he doesn’t have a huge public catalog of Metaculean predictions).
So far, so good. However, this post fails to make the case that being Metaculean is more epistemically virtuous than being Bayesian. All it does is illustrate that these are different ways to measure this virtue. And if someone else doesn’t accept your standards, it makes little sense to criticize them for not adhering to yours. It’s not like you’re adhering to theirs!
Metaculus
I do think the Metaculean ideal deserves some criticism of its own.
Metaculus rewards good public forecasts with play points rather than real-world outcomes, and (last I heard, at least) has some issues like demanding that users update their predictions all the time or lose points; or granting people extra points for making predictions, irrespective of the outcome.
And that’s without mentioning that what we actually care about (e.g. when will we get AGI) is sometimes highly dependent on fiddly resolution criteria, to the point that you can’t even interpret some Metaculus forecasts, or make meaningful predictions about them, without reading walls of text.
(And from personal experience with a related platform: I tried the play money prediction market Manifold Markets for a bit. I earned some extra play money via some easy predictions, but then lost interest once it appeared that I couldn’t cash out of my biggest bet due to lack of liquidity. So now all my play money is frozen for a year and I don’t use the platform anymore.)
All in all, making predictions on playpoint sites like Metaculus sounds like a ton of work for little reward. I guess OP is an attempt to make people use it via social shaming, but I doubt the efficacy of this strategy. If it’s so important that Yudkowsky make Metaculean predictions, have you considered offering bags of money for good forecasts instead?
Final Thoughts
Finally, I find it a bit weird that OP complains about criticism of Metaculus forecasts. The whole point of public forecasting, or so it seemed to me, is for there to be a public record which people can use as a resource, to learn from or to critique. And why would it be necessary for those critics to be Metaculus users themselves? Metaculus is a tiny community; most critics of their forecasts will not be community members; and to disregard those critics would be a loss for Metaculus, not for the critics themselves.
This isn’t a good description of being on Metaculus versus being a Bayesian.
How does one measure if they are “being a Bayesian”? The general point is you can’t, unless you are being scored. You find out by making forecasts—if you aren’t updating you get fewer points, or even lose points. Otherwise you have people who are just saying things that thematically sound Bayesian but don’t mean very much in terms of updated believes. Partly I’m making an epistemic claim that Eliezer can’t actually know if he’s being a good Bayesian, without proper forecasting. You can check out Tetlock’s work if you’re unsure why that would be the case, though I mention it in the post.
The more central epistemic claim I’m making in this essay: if someone says they are doing a better job of forecasting a topic than other people, but they aren’t actually placing forecasts so we could empirically test if they are, then that person’s forecasts should be held in high suspicion. I’m claiming this would be the same in every other domain, and AI timelines are unlikely to be that special, and his eminence doesn’t really buy a good justification why we would hold him to drastically lower standards about measuring his forecast accuracy.
I understand that you value legibility extremely highly. But you don’t need a community or a scoring rule to assess your performance, you can just let reality do it instead. Surely Yudkowsky deserves a bajillion Bayes points for founding this community and being decades ahead on AI x-risk.
Bayes points may not be worth anything from the Metaculean point of view, but the way I understand you, you seem to be saying that forecasts are worth everything while ignoring actions entirely, which seems bizarre. That was the point in my original comment, that you two have entirely different standards. Yudkowsky isn’t trying to be a good Metaculean, so of course he doesn’t score highly from your point of view.
An analogy could be Elon Musk. He’s done great things that I personally am absolutely incapable of. And he does deserve praise for those things. And indeed, Eliezer was a big influence on me. But he gives extreme predictions that probably won’t age well.
Him starting this site and writing a million words about rationality is wonderful and outstanding. But do you think it predicts forecasting performance nearly as well as proper forecasting actual performance? I claim it doesn’t come anywhere near as good of a predictive factor than just making some actual forecasts and seeing what happens, and I don’t see the opposing position holding well at all. You can argue that “we care about other things too than just forecasting ability” but in this thread I am specifically referring to his implied forecasting accuracy, not his other accomplishments. The way you’re referring to Bayes points here doesn’t seem workable or coherent, any more than Musk Points tell me his predictions are accurate.
If people (like Musk) are continually successful, you know they’re doing something right. One-off sucess can be survivorship bias, but the odds of having continued success by mere happenstance get very low, very quickly.
When I call that “getting Bayes points”, what I mean is that if someone demonstrates good long-term decision-making, or gets good long-term outcomes, or arrives at an epistemic state more quickly, you know they’re doing some kind of implicit forecasting correctly, because long-term decisions in the present are evaluated by the reality of the future.
This whole discussion vaguely reminds me of conflicts between e.g. boxing and mixed martial arts (MMA) advocates: the former has more rules, while the latter is more flexible, so how can two competitors from their respective disciplines determine who of them is the better martial artist? They could both compete in boxing, or both compete in MMA. Or they could decide not to bother, and remain in their own arenas.
I guess it seems to you like Yudkowsky has encroached on the Metaculus arena but isn’t playing by the Metaculus rules?
No, success and fame are not very informative about forecasting accuracy. Yes they are strongly indicative of other competencies, but you shouldn’t mix those in with our measure of forecasting. And nebulous unscorable statements don’t at all work as “success”, too cherry-picked and unworkable. Musk is famously uncalibrated with famously bad timeline predictions in his domain! I don’t think you should be glossing over that in this context by saying “Well he’s successful...”
If we are talking about measuring forecasting performance, then it’s more like comparing tournament Karate with trench warfare.
I’m going to steal the tournament karate and trench warfare analogy. Thanks.
Unless success breeds more success, irrespective of other factors.