Noosphere89 comments on Foom & Doom 1: “Brain in a box in a basement”

Noosphere89 23 Jun 2025 23:47 UTC
2 points
0
So to address some things on this topic, before I write out a full comment on the post:

I think “what happened is that LLMs basically scaled to AGI (or really full automation of AI R&D) and were the key paradigm (including things like doing RL on agentic tasks with an LLM initialization and a deep learning based paradigm)” is maybe like 65% likely conditional on AGI before 2035.

Flag, but I’d move the year to 2030 or 2032, for 2 reasons:
1. This is when the compute scale-up must slow down, and in particular this is when new fabs have to actively be produced to create more compute (absent reversible computation being developed).
2. This is when the data wall starts hitting for real in pre-training, and in particular means that once there’s no more easily available data, naively scaling will now take 2 decades at best, and by then algorithmic innovations may have been found that make AIs more data efficient.
So if we don’t see LLMs basically scale to fully automating AI R&D at least by 2030-2032, then it’s a huge update that a new paradigm is likely necessary for AI progress.

On this:

Specific observations about the LLM and ML paradigm, both because something close to this is a plausible paradigm for AGI and because it updates us about rates we’d expect in future paradigms.

I’d expect the evidential update to be weaker than you suppose, and in particular I’m not sold on the idea that LLMs usefully inform us about what to expect, and this is because a non-trivial part of their performance right now is based on tasks which don’t require that much context in the long-term, and this probably explains a lot of the difference between benchmarks and reality right now:

https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9

The other issue is AIs have a fixed error rate, but the trend is due to AI models decreasing in error everytime a new training run is introduced, however we have reason to believe that humans don’t have a fixed error rate, and this is probably the remaining advantage of humans over AIs:

https://www.lesswrong.com/posts/Ya7WfFXThJ6cn4Cqz/ai-121-part-1-new-connections#qpuyWJZkXapnqjgT7

But of course, the interesting thing here is that the human baselines do not seem to hit this sigmoid wall. It’s not the case that if a human can’t do a task in 4 hours there’s basically zero chance of them doing it in 48 hours and definitely zero chance of them doing it in 96 hours etc. Instead, human success rates seem to gradually flatline or increase over time, especially if we look at individual steps: the more time that passes, the higher the success rates become, and often the human will wind up solving the task eventually, no matter how unprepossessing the early steps seemed. In fact, we will often observe that a step that a human failed on earlier in the episode, implying some low % rate, will be repeated many times and quickly approach 100% success rates! And this is true despite earlier successes often being millions of vision+text+audio+sensorimotor tokens in the past (and interrupted by other episodes or tasks themselves equivalent to millions of tokens), raising questions about whether self-attention over a context window can possibly explain it. Some people will go so far as to anthropomorphize human agents and call this ‘learning’, and so I will refer to these temporal correlations as learning too.

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

So I tentatively believe that in the case of a new paradigm arising, takeoff will probably be faster than with LLMs by some margin, though I do think slow-takeoff worlds are plausible.

Views that compute is likely to be a key driver of progress and that things will first be achieved at a high level of compute. (Due to mix of updates from LLMs/ML and also from general prior views.)

I think this is very importantly true, even in worlds where the ultimate cost in compute for human level intelligence is insanely cheap (like 10^14 flops or cheaper in inference, and 10^18 or less for training compute).

We should expect high initial levels of compute for AGI before we see major compute efficiency increases.

Views about how technology progress generally works as also applied to AI. E.g., you tend to get a shitty version of things before you get the good version of things which makes progress more continuous.

This is my biggest worldview take on what I think the change of paradigms will look like (if it happens). While there are threshold effects, we should expect memory and continual learning to be pretty shitty at first, and gradually get better.

While I do expect a discontinuity in usefulness, for reasons shown below, I do agree that the path to the new paradigm (if it happens) is going to involve continual improvements.

Reasons are below:

But I think this style of analysis suggests that for most tasks, where verification is costly and reliability is important, you should expect a fairly long stretch of less-than-total automation before the need for human labor abruptly falls off a cliff.

The general behavior here is that as the model gets better at both doing and checking, your cost smoothly asymptotes to the cost of humans checking the work, and then drops almost instantaneously to zero as the quality of the model approaches 100%.

https://x.com/AndreTI/status/1934747831564423561