ryan_greenblatt comments on Foom & Doom 1: “Brain in a box in a basement”

ryan_greenblatt 23 Jun 2025 18:59 UTC
LW: 25 AF: 10
7
AF
But I’m in much closer agreement with that scenario than the vast majority of AI safety & alignment researchers today, who tend to see the “foom & doom” scenario above as somewhere between “extraordinarily unlikely” and “already falsified”!
Those researchers are not asking each other “is it true?”, but rather “lol, can you believe that some people used to believe that?”.^[1] Oh well. Laugh all you want. It’s still what I believe.
To clarify my views:
- I think it’s very unlikely (maybe 3%) that a small team with fewer computational resources than 32 H100 equivalents builds a system which rockets from unimpressive to ASI in <2 weeks (prior to some other larger and better resourced group creating powerful AI and conditional on not being in a regime where other groups are deliberately not advancing capabilities for a sustained period, e.g. due to governance).
  - I don’t think it’s “already falsified”, but I do think we’ve gotten evidence against this perspective. In particular, this perspective makes ~no prediction about economic impact of earlier AI systems or investment (and at least Eliezer was predicting we wouldn’t see earlier economic effects) while an alternative more continuous / slower takeoff prediction does make predictions about massive investment. We’ve seen massive investment, so we should update some toward the slower takeoff perspective. This isn’t a huge update (I think something like 2:1), so if you were very confident, it doesn’t make much difference.
  - My view of “3%” is roughly my current inside view, but I don’t think this is very reflectively stable. I think if I was forecasting, I’d probably go a touch higher due to some deference to people who think this is more likely.
- I think it’s plausible but unlikely that sudden large paradigm shifts or sudden large chunks of algorithmic progress happen and cause a huge jump in capabilities (by this, I mean something which is at least as big as 2 years of overall AI progress which is perhaps 400x effective compute, though this might not be meaningful due to qualitative limits on current approaches). Perhaps I think this is 10% likely prior to AIs which can fully automate AI R&D and about 20% likely at any point prior to crazy ASI.
  - This is made more plausible by higher compute availability, by more research on AI, and by substantial AI automation of AI R&D.
  - I tend to think this is somewhat underrated among people working in AI safety
- It seems plausible but unlikely that takeoff is very fast because at the point of AGI, the returns to further compute / algorithmic progress are much higher than in the past. In particular, I think currently we see something like 1 SD (Standard Deviation) of human equivalent performance per 10x increase of effective compute in LLMs (I have an unreleased post discussing this in more detail which I plan on posting soon) and I can easily imagine this increasing to more like 4-6 SD / 10x such that you blow through the human range quite quickly. (Though more like months or maybe weeks than days.) Scaling up the human brain by 10x (post-adaptation and resolving issues that might show up) would probably be something like +4 SD of IQ from my understanding.
  - Edit: my post discussing what I expect we see per 10x increase in effective compute is now up: What does 10x-ing effective compute get you?
- I think the event “what happened is that LLMs basically scaled to AGI (or really full automation of AI R&D) and were the key paradigm (including things like doing RL on agentic tasks with an LLM initialization and a deep learning based paradigm)” is maybe like 65% likely conditional on AGI before 2035. (The event “ASI was made basically by scaling LLMs” is probably much less likely (idk 30%), but I feel even more confused by this.)
  - This view isn’t very well considered and it’s plausible I substantially update on further reflection, but it’s hard for me to imagine going above 90% or below 30%.
- I think of most of my work as not making strong assumptions about the paradigm, except that it assumes AIs are trainable and I’m assuming relatively slower takeoff.
These posts are mainly exploring my disagreement with a group of researchers who think of LLMs^[2] as being on a smooth, continuous path towards ASI. This group comprises probably >95% of people working on AI alignment, safety, and governance today^[3].
(For many people in this group, if you ask them directly whether there might be important changes in AI algorithms, training approaches, etc., between today and ASI, they’ll say “Oh yes, of course that’s possible”. But if you ask them any other question about the future of AI, they’ll answer as if they expect no such change.)
There’s a very short answer to why I disagree with those LLM-focused researchers on foom & doom: They expect LLMs to scale to ASI, and I don’t.
As noted above, I don’t feel particularly strongly that LLMs will scale to ASI and this isn’t a very load bearing part of my perspective.
Further, I don’t think my views about continuity and slower takeoff (more like 6 months to a few years depending on what you’re counting, but also with some probability on more like a decade) are that strongly driven by putting a bunch of probability on LLMs scaling to AGI / full automation of AI R&D. It’s based on:
- Specific observations about the LLM and ML paradigm, both because something close to this is a plausible paradigm for AGI and because it updates us about rates we’d expect in future paradigms.
- Views that compute is likely to be a key driver of progress and that things will first be achieved at a high level of compute. (Due to mix of updates from LLMs/ML and also from general prior views.)
- Views about how technology progress generally works as also applied to AI. E.g., you tend to get a shitty version of things before you get the good version of things which makes progress more continuous.
- Noosphere89 23 Jun 2025 23:47 UTC
  2 points
  0
  Parent
  So to address some things on this topic, before I write out a full comment on the post:
  
  I think “what happened is that LLMs basically scaled to AGI (or really full automation of AI R&D) and were the key paradigm (including things like doing RL on agentic tasks with an LLM initialization and a deep learning based paradigm)” is maybe like 65% likely conditional on AGI before 2035.
  
  Flag, but I’d move the year to 2030 or 2032, for 2 reasons:
  1. This is when the compute scale-up must slow down, and in particular this is when new fabs have to actively be produced to create more compute (absent reversible computation being developed).
  2. This is when the data wall starts hitting for real in pre-training, and in particular means that once there’s no more easily available data, naively scaling will now take 2 decades at best, and by then algorithmic innovations may have been found that make AIs more data efficient.
  So if we don’t see LLMs basically scale to fully automating AI R&D at least by 2030-2032, then it’s a huge update that a new paradigm is likely necessary for AI progress.
  
  On this:
  
  Specific observations about the LLM and ML paradigm, both because something close to this is a plausible paradigm for AGI and because it updates us about rates we’d expect in future paradigms.
  
  I’d expect the evidential update to be weaker than you suppose, and in particular I’m not sold on the idea that LLMs usefully inform us about what to expect, and this is because a non-trivial part of their performance right now is based on tasks which don’t require that much context in the long-term, and this probably explains a lot of the difference between benchmarks and reality right now:
  
  https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9
  
  The other issue is AIs have a fixed error rate, but the trend is due to AI models decreasing in error everytime a new training run is introduced, however we have reason to believe that humans don’t have a fixed error rate, and this is probably the remaining advantage of humans over AIs:
  
  https://www.lesswrong.com/posts/Ya7WfFXThJ6cn4Cqz/ai-121-part-1-new-connections#qpuyWJZkXapnqjgT7
  
  But of course, the interesting thing here is that the human baselines do not seem to hit this sigmoid wall. It’s not the case that if a human can’t do a task in 4 hours there’s basically zero chance of them doing it in 48 hours and definitely zero chance of them doing it in 96 hours etc. Instead, human success rates seem to gradually flatline or increase over time, especially if we look at individual steps: the more time that passes, the higher the success rates become, and often the human will wind up solving the task eventually, no matter how unprepossessing the early steps seemed. In fact, we will often observe that a step that a human failed on earlier in the episode, implying some low % rate, will be repeated many times and quickly approach 100% success rates! And this is true despite earlier successes often being millions of vision+text+audio+sensorimotor tokens in the past (and interrupted by other episodes or tasks themselves equivalent to millions of tokens), raising questions about whether self-attention over a context window can possibly explain it. Some people will go so far as to anthropomorphize human agents and call this ‘learning’, and so I will refer to these temporal correlations as learning too.
  
  https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF
  
  So I tentatively believe that in the case of a new paradigm arising, takeoff will probably be faster than with LLMs by some margin, though I do think slow-takeoff worlds are plausible.
  
  Views that compute is likely to be a key driver of progress and that things will first be achieved at a high level of compute. (Due to mix of updates from LLMs/ML and also from general prior views.)
  
  I think this is very importantly true, even in worlds where the ultimate cost in compute for human level intelligence is insanely cheap (like 10^14 flops or cheaper in inference, and 10^18 or less for training compute).
  
  We should expect high initial levels of compute for AGI before we see major compute efficiency increases.
  
  Views about how technology progress generally works as also applied to AI. E.g., you tend to get a shitty version of things before you get the good version of things which makes progress more continuous.
  
  This is my biggest worldview take on what I think the change of paradigms will look like (if it happens). While there are threshold effects, we should expect memory and continual learning to be pretty shitty at first, and gradually get better.
  
  While I do expect a discontinuity in usefulness, for reasons shown below, I do agree that the path to the new paradigm (if it happens) is going to involve continual improvements.
  
  Reasons are below:
  
  But I think this style of analysis suggests that for most tasks, where verification is costly and reliability is important, you should expect a fairly long stretch of less-than-total automation before the need for human labor abruptly falls off a cliff.
  
  The general behavior here is that as the model gets better at both doing and checking, your cost smoothly asymptotes to the cost of humans checking the work, and then drops almost instantaneously to zero as the quality of the model approaches 100%.
  
  https://x.com/AndreTI/status/1934747831564423561
- anaguma 24 Jun 2025 16:37 UTC
  1 point
  0
  Parent
  Scaling up the human brain by 10x (post-adaptation and resolving issues that might show up) would probably be something like +4 SD of IQ from my understanding.
  
  How did you get this estimate?
  - ryan_greenblatt 24 Jun 2025 19:09 UTC
    7 points
    0
    Parent
    See footnote 8 here.
    - anaguma 24 Jun 2025 20:30 UTC
      1 point
      0
      Parent
      Thanks!