Eliezer Yudkowsky comments on Contradict my take on OpenPhil’s past AI beliefs

Eliezer Yudkowsky 28 Dec 2025 23:14 UTC
21 points
5
Noted. I think you are overlooking some of the dynamics of the weird dance that a bureaucratic institution does around pretending to be daring while their opinions are in fact insufficiently extreme; eg, why when OpenPhil ran a “change our views” contest, they predictably awarded all of the money to critiques arguing for longer timelines and lower risk, even though reality was in the opposite direction of their opinions from that. Just like OpenPhil predictably gave all the money to “we need two Stalins” critiques of them in the contest, OpenPhil might have managed to communicate to the ‘superforecasters’ or their institutions that the demanded apparent disagreement with OpenPhil’s overt forecast was in the “we need two Stalins” direction of longer timelines and lower risks.

Or to rephrase: If I can look at the organizational dynamics and see it as obvious in advance that OpenPhil’s “challenge our worldviews” contest would award all the money to people arguing for longer timelines and lower risk, (despite reality lying in the opposite direction, according to those people’s own later updates, even); then maybe the people advertising themselves as producing superforecaster reports, can successfully read OpenPhil’s mind about what direction of superforecaster disagreement is being secretly demanded.

But, sure, fair enough, I should also update somewhat in favor of the average superforecaster being even worse at AI than OpenPhil and them delivering an honest terrible report. I guess it’s just surprising to me because I would’ve expected the key maneuver here to be saying “I dunno” and not throwing around extreme opinions or numbers, and I would’ve thought superforecasters able to do that better than OpenPhil… but eh, idk, maybe they just straight up couldn’t tell the difference between the usually good rule “nothing ever happens” and “AGI in particular never happens”, and also didn’t know themselves for overconfident or incompetent at being able to apply the rule.

If so, it would speak correspondingly poorly of those EAs who stood around gesturing at the superforecasters and saying, “Why believe MIRI when you could believe these great certified experts?”
- Lukas Finnveden 29 Dec 2025 0:28 UTC
  2 points
  0
  Parent
  (Quick flag that, if you have energy for more engagement, I’d most bid for a source for the claim “superforecasters assigned 1% to IMO gold by 2025”. As mentioned in my reply to your parallel comment.)
  maybe the people advertising themselves as producing superforecaster reports, can successfully read OpenPhil’s mind about what direction of superforecaster disagreement is being secretly demanded
  I agree that’s one possible hypothesis. It’s more complicated than “OP rewards agreement”, and I don’t currently see why I should assign a high prior to it. (Like, someone could also make a plausible-sounding argument for the opposite: that dysfunctional OP will of course want superforecasters to have more extreme views than OP itself, to provide cover and make OP’s own views look more moderate and reasonable by comparison.)
  Combined with the evidence being pretty limited (I suppose (i) the XPT, and also (ii) one worldview critique contest that they wouldn’t have run if it wasn’t for FTX starting it and then crashing, and where my impression is they weren’t excited about the resulting entries), I’m not sold.
  maybe they just straight up couldn’t tell the difference between the usually good rule “nothing ever happens” and “AGI in particular never happens”, and also didn’t know themselves for overconfident or incompetent at being able to apply the rule.
  I think this is probably a lot of what’s going on.