Eliezer Yudkowsky comments on Contradict my take on OpenPhil’s past AI beliefs

Eliezer Yudkowsky 21 Dec 2025 5:57 UTC
19 points
3
I think if they sponsored Cotra’s work and cited it, this reflects badly on them. More on them than on Cotra, really; I am not a fan of the theory that you blame the people who were selected to have an opinion or incentivised to have an opinion, so much as the people who did the selection and incentivization. See https://www.lesswrong.com/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works, which I think stands out as clearly correct in retrospect, for why their analysis was obviously wrong at the time. And I did in that case take the trouble to explain why their whole complicated analysis was bogus, and my model is that this clearly-correct-in-retrospect critique had roughly zero impact or effect on OpenPhil; and that is what I expected and predicted in advance, which is why I did not spend more effort trying to redeem an organization I modeled as irredeemably broken.
- mattmacdermott 21 Dec 2025 10:37 UTC
  22 points
  15
  Parent
  Do you find Daniel Kokotajlo’s subsequent work advocating for short timelines valuable? I ask because I believe that he sees/saw his work as directly building on Cotra’s^[1].
  I think the bar for work being a productive step in the conversation is lower than the bar for it turning out to be correct in hindsight or even its methodology being highly defensible at the time.
  Is your position more, “Producing such a model was a fine and good step in the conversation, but OP mistakenly adopted it to guide their actions,” or “Producing such a model was always going to have been a poor move”?
  1. ^
    I remember a talk in 2022 where he presented an argument for 10 year timelines, saying, “I stand on the shoulders of Ajeya Cotra”, but I’m on mobile and can’t hunt down a source. Maybe @Daniel Kokotajlo can confirm or disconfirm.
  - Daniel Kokotajlo 21 Dec 2025 15:59 UTC
    36 points
    3
    Parent
    Indeed I do think of it that way.
    - habryka 21 Dec 2025 23:09 UTC
      7 points
      −5
      Parent
      FWIW, I continue to think your models here are obviously not building on Cotra’s thing, and think something pretty weird is going on when you say they do. Which is not like catastrophic, but I think the credit allocation here feels quite weird.
    - Eliezer Yudkowsky 21 Dec 2025 16:20 UTC
      6 points
      0
      Parent
      Is your take “Use these different parameters and you get AGI in 2028 with the current methods”?
      - Daniel Kokotajlo 22 Dec 2025 20:19 UTC
        18 points
        3
        Parent
        At the time, iirc, I went through Ajeya’s spreadsheet and thought about each parameter and twiddled them to be more correct-according-to-me, and got something like median 2030 at the end.
      - StanislavKrym 21 Dec 2025 16:27 UTC
        1 point
        0
        Parent
        As far as Kokotajlo’s memory can be trusted after the ChatGPT moment, he thought that there would be a 50% chance to reach AGI in 2030.
        Eliezer Yudkowsky 21 Dec 2025 17:24 UTC
        10 points
        −5
        Parent
        If you can get that or 2050 equally well off yelling “Biological Anchoring”, why not admit that the intuition comes first and then you hunt around for parameters you like? This doesn’t sound like good methodology to me.
        What links here?
        StanislavKrym's comment on Why does Eliezer make abrasive public comments? by k64 (23 Dec 2025 2:30 UTC; -5 points)
        Daniel Kokotajlo 22 Dec 2025 20:22 UTC
        20 points
        0
        Parent
        I don’t think the intuition came first? I think it was playing around with the model that caused my intuitions to shift, not the other way around. Hard to attribute exactly ofc.
        
        Anyhow, I certainly don’t deny that there’s a big general tendency for people to fit models to their intuitions. I think you are falsely implying that I do deny that. I don’t know if I’ve loudly stated it publicly before but I definitely am aware of that and have been for years, and I’m embracing it in fact—the model is a helpful tool for articulating and refining and yes sometimes changing my intuitions, but the intuitions still play a central role. I’ll try to state that more loudly in future releases.
        StanislavKrym 21 Dec 2025 19:05 UTC
        1 point
        −2
        Parent
        One can apply similar methodological arguments to a different problem and test whether they persist. The amount of civilisations not in the Solar Systems is thought to be estimatable via the Drake equation. Drake’s original estimates implied that the Milky Way contains between 1K and 100M civilisations. The only ground truth that we know is the fact that we have yet to find any reliable evidence of such civilisations. But I don’t understand where the equation itself is erroneous.
        Returning to the AI timeline crux, Cotra’s idea was the following. TAI is created once someone spends enough compute. Cotra’s main idea was that compute_required = (compute_under_2020_knowledge)/(knowledge_factor(t)), compute_affordable increases exponentially before the bottlenecks related to the world’s economy. The estimates on compute_affordable required mankind only to keep track of who produces it, how it is done and who is willing to pay. A similar procedure was done in the AI-2027 compute forecast.
        Then Cotra proceeded to wildly misestimate. Her idea of the knowledge factor was that it makes creation of the TAI twice easier every 2-3 years, which I doubt that I would understand for reasons described in the collapsed sections. Cotra’s ideas on compute_under_2020_knowledge are total BS for reasons I detailed in another comment. Therefore, I fail to understand where Cotra was mistaken aside from using parameters that are total BS. Nor do I think that if Cotra’s model was correct aside from BSed parameters, it wouldn’t be a natural move to correct the parameters.
        Cotra’s rationalisation for the TAI to become twice as easy to create every few years
        I consider two types of algorithmic progress: relatively incremental and steady progress from iteratively improving architectures and learning algorithms, and the chance of “breakthrough” progress which brings the technical difficulty of training a transformative model down from “astronomically large” / “impossible” to “broadly feasible.”
        For incremental progress, the main source I used was Hernandez and Brown 2020, “Measuring the Algorithmic Efficiency of Neural Networks.” The authors reimplemented open source state-of-the-art (SOTA) ImageNet models between 2012 and 2019 (six models in total). They trained each model up to the point that it achieved the same performance as AlexNet achieved in 2012, and recorded the total FLOP that required. They found that the SOTA model in 2019, EfficientNet B0, required ~44 times fewer training FLOP to achieve AlexNet performance than AlexNet did; the six data points fit a power law curve with the amount of computation required to match AlexNet halving every ~16 months over the seven years in the dataset. They also show that linear programming displayed a similar trend over a longer period of time: when hardware is held fixed, the time in seconds taken to solve a standard basket of mixed integer programs by SOTA commercial software packages halved every ~13 months over the 21 years from 1996 to 2017.
        Grace 2013 (“Algorithmic Progress in Six Domains”) is the only other paper attempting to systematically quantify algorithmic progress that I am currently aware of, although I have not done a systematic literature review and may be missing others. I have chosen not to examine it in detail because a) it was written largely before the deep learning boom and mostly does not focus on ML tasks, and b) it is less straightforward to translate Grace’s results into the format that I am most interested in (“How has the amount of computation required to solve a fixed task decreased over time?”). Paul is familiar with the results, and he believes that algorithmic progress across the six domains studied in Grace 2013 is consistent with a similar but slightly slower rate of progress, ranging from 13 to 36 months to halve the computation required to reach a fixed level of performance.
        This means that the compute required is halving every 16 months for AlexNet, every 13 months for linear programming. While Claude Opus 4.5 does seem to think that Paul’s belief is close to what Grace’s paper implies, the paper’s relevance is likely undermined by Cotra’s own criticism. Next Cotra listed actual assumptions for each of the models, including the two clearly BSed ones. I marked her ideas on when the required compute is halved in bold:
        Cotra’s actual assumptions
        I assumed that:
        Training FLOP requirements for the Lifetime Anchor hypothesis (red) are halving once every 3.5 years and there is only room to improve by ~2 OOM from the 2020 level—moving from a median of ~1e28 in 2020 to ~1e26 by 2100.
        Training FLOP requirements for the Short horizon neural network hypothesis (orange) are halving once every 3 years and there is room to improve by ~2 OOM from the 2020 level—moving from a median of ~1e31 in 2020 to ~3e29 by 2100.
        Training FLOP requirements for the Genome Anchor hypothesis (yellow) are halving once every 3 years and there is room to improve by ~3 OOM from the 2020 level—moving from a median of ~3e33 in 2020 to ~3e30 by 2100.
        Training FLOP requirements for the Medium-horizon neural network hypothesis (green) are halving once every 2 years and there is room to improve by ~3 OOM from the 2020 level—moving from a median of ~3e34 in 2020 to ~3e31 by 2100.
        Training FLOP requirements for the Long-horizon neural network hypothesis (blue) are halving once every 2 years and there is room to improve by ~4 OOM from the 2020 level—moving from a median of ~1e38 in 2020 to ~1e34 by 2100.
        Training FLOP requirements for the Evolution Anchor hypothesis (purple) are halving once every 2 years and there is room to improve by ~5 OOM from the 2020 level—moving from a median of ~1e41 in 2020 to ~1e36 by 2100.
  - Eliezer Yudkowsky 21 Dec 2025 15:32 UTC
    16 points
    4
    Parent
    I think OpenPhil was guided by Cotra’s estimate and promoted that estimate. If they’d labeled it: “Epistemic status: Obviously wrong but maybe somebody builds on it someday” then it would have had a different impact and probably not one I found objectionable.
    
    Separately, I can’t imagine how you could build something not-BS on that foundation and if people are using it to advocate for short timelines then I probably regard that argument as BS and invalid as well.
    What links here?
    StanislavKrym's comment on Why does Eliezer make abrasive public comments? by k64 (23 Dec 2025 2:30 UTC; -5 points)
    - StanislavKrym 21 Dec 2025 16:13 UTC
      26 points
      1
      Parent
      Except that @Daniel Kokotajlo wrote an entire sequence where the only post published after the ChatGPT moment is this one. Kokotajlo’s sequence was supposed to explain that Cotra’s distribution of training compute for a TAI created by 2020′s ideas is biased towards requiring far more compute than is actually needed.
      Kokotajlo’s quote related to Cotra’s errors
      Ajeya’s timelines report is the best thing that’s ever been written about AI timelines imo. Whenever people ask me for my views on timelines, I go through the following mini-flowchart:
      1. Have you read Ajeya’s report?
      --If yes, launch into a conversation about the distribution over 2020′s training compute and explain why I think the distribution should be substantially to the left, why I worry it might shift leftward faster than she projects, and why I think we should use it to forecast AI-PONR instead of TAI.
      --If no, launch into a conversation about Ajeya’s framework and why it’s the best and why all discussion of AI timelines should begin there.
      However, Kokotajlo’s comment outright claims that everyone’s timelines should be a variation of Cotra’s model.
      Kokotajlo praising Cotra’s model
      So, why do I think it’s the best? Well, there’s a lot to say on the subject, but, in a nutshell: Ajeya’s framework is to AI forecasting what actual climate models are to climate change forecasting (by contrast with lower-tier methods such as “Just look at the time series of temperature over time / AI performance over time and extrapolate” and “Make a list of factors that might push the temperature up or down in the future / make AI progress harder or easier,” and of course the classic “poll a bunch of people with vaguely related credentials.”
      There’s something else which is harder to convey… I want to say Ajeya’s model doesn’t actually assume anything, or maybe it makes only a few very plausible assumptions. This is underappreciated, I think. People will say e.g. “I think data is the bottleneck, not compute.” But Ajeya’s model doesn’t assume otherwise! If you think data is the bottleneck, then the model is more difficult for you to use and will give more boring outputs, but you can still use it. (Concretely, you’d have 2020′s training compute requirements distribution with lots of probability mass way to the right, and then rather than say the distribution shifts to the left at a rate of about one OOM a decade, you’d input whatever trend you think characterizes the likely improvements in data gathering.)
      The upshot of this is that I think a lot of people are making a mistake when they treat Ajeya’s framework as just another model to foxily aggregate over. “When I think through Ajeya’s model, I get X timelines, but then when I extrapolate out GWP trends I get Y timelines, so I’m going to go with (X+Y)/2.” I think instead everyone’s timelines should be derived from variations on Ajeya’s model, with extensions to account for things deemed important (like data collection progress) and tweaks upwards or downwards to account for the rest of the stuff not modelled.
      What links here?
      StanislavKrym's comment on Why does Eliezer make abrasive public comments? by k64 (23 Dec 2025 2:30 UTC; -5 points)
  - StanislavKrym 21 Dec 2025 12:17 UTC
    4 points
    −8
    Parent
    What Kokotajlo likely means by claiming that “Ajeya’s timelines report is the best thing that’s ever been written about AI timelines imo” is that Cotra’s model was a good model where the values of parameters are total BS. Her report consists of four parts. In my opinion, two clearest example of BS are the genome anchor which had the transformative model “have about as many parameters as there are bytes in the human genome (~7.5e8 bytes)” and the evolution anchor which claimed “that training computation requirements will resemble the amount of computation performed in all animal brains over the course of evolution from the earliest animals with neurons to modern humans”. This is outright absurd since each animal has its own brain which is trained independently of others, and yet the human brain’s architecture and training are outright enough to have a chance to create the AI who does what a human genius can do.
- Lukas_Gloor 21 Dec 2025 22:14 UTC
  8 points
  −2
  Parent
  I think if they sponsored Cotra’s work and cited it, this reflects badly on them.
  I find that position weirdly harsh. Sure, if you’re just answering anaguma’s question as a binary (“does it reflect well or poorly, regardless of magnitude?”), that could make sense. (Note to readers: This would mean that the quote I started this comment with should be regarded as taken out of context!) But seeing it as reflecting badly at a high magnitude is the judgment I’d consider weirdly harsh.
  
  I’m saying that as someone who has very little epistemic respect for people who think AI ruin is only about 10% likely—I consider people who think that biased beyond hope.
  
  But back to the timelines point:
  
  It’s not like Bioanchors was claiming high confidence in its modelling assumptions or resultant timelines. At the time, a bunch of people in the broader EA ecosystem had even longer timelines, and Bioanchors IIRC took a somewhat strong stance against assigning significant probability mass for >2100, which some EAs at least considered non-obvious. Seen in that context, it contributed to people updating in the right direction. The report also contained footnotes pointing out that advisors held in high regard by Ajeya had shorter timelines based on specific thoughts on horizon lengths or whatever, so the report was hedging towards shorter timelines. Factoring that in, it aged less poorly than it would have if we weren’t counting those footnotes. Ajeya also posted an update 2 years later where she shortened her timelines a bunch. If it takes orgs only 2 years to update significantly in the right direction, are they really hopelessly broken?
  
  FWIW, I’m leaning towards you having been right about the critique (credit for sticking your neck out). But why is sponsoring or citing work like that such a bad sign? Sure, if they cited it as particularly authoritative, that would be different. But I don’t feel like Open Phil did that. (This seems like a crux based on your questions in the OP and your comments here; my sense from reading other people’s replies, and also my less informed impressions I got from interacting with some Open Phil staff at very few short occasions, is it that you were overestimating the degree to which Open Phil was attached to specific views.)
  For comparison, I think Carlsmith’s report on power-seeking was a lot worse in terms of where its predictions landed, so I’d have more sympathy if you pointed to it as an example of what reflects poorly on Open Phil (just want to flag that Carlsmith is my favorite philosophy writer in all of EA). However, also there, I doubt the report was particularly influential within Open Phil, and I don’t remember it being promoted as such. Also, I would guess that the pushback it received from many sides would have changed their evaluation of the report after it was written, if they had initially been more inclined to update on it. I mean, that’s part of the point of writing/publishing reports like that.
  
  Sure, maybe Open Phil was doing a bunch of work directed more towards convincing outside skeptics that what they’re doing is legitimate/okay rather than doing the work “for themselves”? If so, that’s a strategic choice… I can see it leading to biased epistemics, but in a world where things had gone better, maybe it would have gotten further billionaires on board with their mission of giving? And it’s not like doing the insular MIRI thing that you all had been doing before the recent change to get into public comms was risk-free for internal epistemics either. There are risks on both ends of the spectrum, outward-looking and deferring to many experts or at least “caring whether you can convince them”, and inward looking with a small/shrinking circle of people whose research opinions you respect.
  
  On whether some orgs are/were hopelessly broken: it’s possible. I feel sad about many things having aged poorly and I feel like the EA movement has done disappointingly poorly. I also feel like I’ve heard once or twice Open Phil staff saying disappointingly dismissive things about MIRI (even though many of the research directions there didn’t age well either).
  
  I don’t have a strong view on Open Phil anymore—it used to be that I had one (and it was positive), so I have became more skeptical. Maybe you’re picking up a real thing about Open Phil’s x-risk-focused teams having been irredeemably biased or clouded in their approaches. But insofar as you are, I feel like you’ve started with unfortunate examples that, at least to me, don’t ring super true. (I felt prompted to comment because I feel like I should be well-dispositioned to sympathize with your takes given how disappointed I am at the people who still think there’s only a 10% AI ruin chance.)