Foom & Doom 1: “Brain in a box in a basement”
1.1 Series summary and Table of Contents
This is a two-post series on AI “foom” (this post) and “doom” (next post).
A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this new mega-mind. The extinction of humans (and every other species) would rapidly follow (“doom”). The ASI would then spend countless eons fulfilling its desires, desires which we humans would find to be bizarre and pointless.
Now, I don’t endorse every word of that foom & doom scenario above—for example, I don’t think “foom” requires recursive self-improvement. But I’m in much closer agreement with that scenario than the vast majority of AI safety & alignment researchers today, who tend to see the “foom & doom” scenario above as somewhere between “extraordinarily unlikely” and “already falsified”!
Those researchers are not asking each other “is it true?”, but rather “lol, can you believe that some people used to believe that?”.[1] Oh well. Laugh all you want. It’s still what I believe.
Conversely, from my perspective as a foom & doomer, it’s the mainstream contemporary AI alignment discourse that feels increasingly foreign and strange. How, I ask myself, do so many seemingly reasonable people wind up with such wildly, bafflingly over-optimistic beliefs as “P(doom)≲50%”??
Anyway, my main goal in these two posts is to explore how I wind up in such a different place as most other alignment researchers do today, on the question of foom & doom. I don’t particularly expect to win skeptical readers over to my side, but would at least like to convey that foom & doom is a story that hangs together and deserves a modicum of consideration.
1.1.2 Should I stop reading if I expect LLMs to scale to ASI?
These posts are mainly exploring my disagreement with a group of researchers who think of LLMs[2] as being on a smooth, continuous path towards ASI. This group comprises probably >95% of people working on AI alignment, safety, and governance today[3].
(For many people in this group, if you ask them directly whether there might be important changes in AI algorithms, training approaches, etc., between today and ASI, they’ll say “Oh yes, of course that’s possible”. But if you ask them any other question about the future of AI, they’ll answer as if they expect no such change.)
There’s a very short answer to why I disagree with those LLM-focused researchers on foom & doom: They expect LLMs to scale to ASI, and I don’t. Instead I expect that ASI will be a very different AI paradigm: “brain-like AGI” (more on which below and in the next post).
So if you’re an LLM-focused reader, you may be thinking: “Well, Steve is starting from a weird premise, so no wonder he gets a weird conclusion. Got it. Cool. …Why should I bother reading 15,000 more words about this topic?”
But before you go, I do think there are lots of interesting details in the story of exactly how those different starting premises (LLMs vs a different paradigm) flow down to wildly divergent views on foom & doom.
And some of those details will also, incidentally, clarify disagreements within the LLM-focused community. For example,
I might say “I expect next-paradigm AIs to be different from (and more dangerous than) current LLMs because of X, Y, Z”;
and meanwhile, an LLM-focused doomer might say “I expect future, more powerful, LLMs to be different from (and more dangerous than) current LLMs because of one or more of the very same X, Y, Z!”
So I’m hopeful that these posts will have some “food for thought” for doomers like me trying to understand where those P(doom)≲50% “optimists” are coming from, and likewise for “optimists” trying to better understand doomers.
1.2 Post summary and Table of Contents
This post covers “foom”, my belief that there will be a sharp localized takeoff, in which a far more powerful and compute-efficient kind of AI emerges suddenly into an utterly unprepared world. I explore the scenario, various arguments against it and why I don’t find them compelling, and the terrifying implications (if true) on our prospects for AI governance, supervised oversight, testing, and more. Here’s the outline:
In Section 1.3, I argue that there is a yet-to-be-discovered “simple(ish) core of intelligence”, which would constitute a future AI paradigm. I offer the human brain as an existence proof, and I argue that most LLM-focused researchers—even researchers who think of themselves as “AGI-pilled”—have an insufficiently radical picture in their head for what real AGI could do.
Then in Section 1.4, I respond to four popular counterarguments, including “If such a thing existed, then somebody would have already found it!” and “So what? A new paradigm would just be another variety of ML.”.
In Section 1.5, I suggest that this new paradigm will involve frighteningly little training compute, compared to what we’re used to from LLMs.
Section 1.6 then explores some consequences of “frighteningly little training compute”, particularly my pessimism about efforts to delay or govern ASI. Worse, those efforts tend to involve public messaging that sounds like “boo LLMs”, and these messages are getting co-opted by precisely those AI researchers whose work is most dangerous—namely, the researchers trying to replace LLMs with a more powerful AI paradigm.
In Section 1.7, I argue that the amount of R&D required for this future scary AI paradigm to cross the gap from “seemingly irrelevant” to superintelligence will probably be tiny, like maybe 0–30 person-years. This belief is unusual, and wildly different from what’s happened with LLMs; indeed, AI-2027 envisions millions of person-years of R&D to cross that same capabilities gap.
Section 1.8 explores some consequences that would follow from that “very little R&D” hypothesis, including probably an extremely sharp and local takeoff even without recursive self-improvement; a lack of any pre-takeover ‘deployment’ (internal or external) of this future scary paradigm AI; and an AI getting a decisive strategic advantage (opportunity to kill everyone and run the world by itself).
Section 1.9 is a brief discussion of timelines-to-AGI. I throw out some numbers (“probably 5–25 years”), but who knows really.
1.3 A far-more-powerful, yet-to-be-discovered, “simple(ish) core of intelligence
LLMs are very impressive, but they’re not AGI yet—not by my definition. For example, existing AIs are nowhere near capable of autonomously writing a business plan and then founding a company and growing it to $1B/year revenue, all with zero human intervention. By analogy, if humans were like current AIs, then humans would be able to do some narrow bits of founding and running companies by ourselves, but we would need some intelligent non-human entity (angels?) to repeatedly intervene, assign tasks to us humans, and keep the larger project on track.
Of course, humans (and groups of humans) don’t need the help of angels to conceive and carry out ambitious projects, like building businesses or going to the moon. We can do it all by ourselves. So by the same token, future AGIs (and groups of AGIs) won’t need the help of humans.
…So that’s my pitch that AGI doesn’t exist yet. And thus, the jury is still out on what AGI (and later, ASI) will look like, or how it will be made.
My expectation is that, for better or worse, LLMs will never be able to carry out those kinds of projects, even after future advances in scaffolding, post-training, and so on. If I’m right, that wouldn’t mean that those projects are beyond the reaches of AI—it’s clearly possible for some algorithm to do those things, because humans can! Rather it would mean that LLMs are the wrong algorithm class. Instead, I think sooner or later someone will figure out a different AI paradigm, and then we’ll get superintelligence with shockingly little compute, shockingly little effort, and in shockingly little time. (I’ll quantify that later.)
Basically, I think that there’s a “simple(ish) core of intelligence”, and that LLMs don’t have it. Instead, people are hacking together workarounds via prodigious quantities of (in Ajeya’s terminology) “scale” (a.k.a. compute, §1.5 below) and “schlep” (a.k.a. R&D, §1.7 below). And researchers are then extrapolating that process into the future, imagining that we’ll turn LLMs into ASI via even more scale and even more schlep, up to quantities of scale and schlep that strike me as ludicrously unnecessary and implausible.
1.3.1 Existence proof: the human cortex
The whole cortex is (more-or-less) a uniform randomly-initialized learning algorithm, and I think it’s basically the secret sauce of human intelligence. Even if you disagree with that, we can go up a level to the whole brain: the human brain algorithm has to be simple enough to fit in our (rather small) genome.[4] And not much evolution happened between us and chimps. And yet our one brain design, without modification, was able to invent farming and science and computers and rocket ships and everything else, none of which has any straightforward connection to tasks on the African savannah.
Anyway, the human cortex is this funny thing with 100,000,000 repeating units, each with 6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on. Nobody knows how it works. You can look up dozens of theories explaining what each of the 6-ish layers is doing and how, but they all disagree with each other. Some of the theories are supported by simulations, but those simulations are unimpressive toy models with no modern practical applications whatsoever.
Remember, if the theories were correct and complete, then they could be turned into simulations able to do all the things that the real human cortex can do[5]—vision, language, motor control, reasoning, inventing new scientific paradigms from scratch, founding and running billion-dollar companies, and so on.
So here is a very different kind of learning algorithm waiting to be discovered, one which we know can scale to AGI, and then to ASI beyond that (per §1.7.2 below). And people are working on it as we speak, and they haven’t succeeded yet, despite decades of work and billions of dollars of resources devoted to figuring it out.
(To be clear, I desperately hope they continue to fail! At least until we have a much better plan for Safe & Beneficial brain-like AGI. See especially §1.8.4 below and the next post.)
1.3.2 Three increasingly-radical perspectives on what AI capability acquisition will look like
Here are three perspectives:
Economists and other people who see AI as a normal technology: “If we want AI to work in some new application area, like some particular industrial design workflow, then humans need to do a lot of R&D work to develop and integrate the AI into this task.”
LLM-focused AGI person: “Ah, that’s true today, but eventually other AIs can do this ‘development and integration’ R&D work for us! No human labor need be involved!”
Me: “No! That’s still not radical enough! In the future, that kind of ‘development and integration’ R&D work just won’t need to be done at all—not by humans, not by AIs, not by anyone! Consider that there are 8 billion copies of basically one human brain design, and if a copy wants to do industrial design, it can just figure it out. By the same token, there can be basically one future AGI design, and if a copy wants to do industrial design, it can just figure it out!”
Another place this comes up is robotics:
Economists: “Humans will need to do R&D to invent good robotics algorithms.”
LLM-focused AGI person: “Future powerful AIs will need to do R&D to invent good robotics algorithms.”
Me: “Future powerful AI will already be a good robotics algorithm!”[6]
…After all, if a human wants to use a new kind of teleoperated robot, nobody needs to do a big R&D project or breed a new subspecies of human. You just take an off-the-shelf bog-standard human brain, and if it wants to pilot a new teleoperated robot, it will just autonomously figure out how to do so, getting rapidly better within a few hours. By the same token, there can be one future AGI design, and it will be able to do that same thing.
1.4 Counter-arguments to there being a far-more-powerful future AI paradigm, and my responses
1.4.1 Possible counter: “If a different, much more powerful, AI paradigm existed, then someone would have already found it.”
I think of this as a classic @paulfchristiano-style rebuttal (see e.g. Yudkowsky and Christiano discuss “Takeoff Speeds”, 2021).
In terms of reference class forecasting, I concede that it’s rather rare for technologies with extreme profit potential to have sudden breakthroughs unlocking massive new capabilities (see here), that “could have happened” many years earlier but didn’t. But there are at least a few examples, like the 2025 baseball “torpedo bat”, wheels on suitcases, the original Bitcoin, and (arguably) nuclear chain reactions.[7]
Also, there’s long been a $1M cash bounty plus eternal fame and glory for solving the Riemann Hypothesis. Why hasn’t someone already solved it? I dunno! I guess it’s hard.
“Ah, but if companies had been putting billions of dollars into solving the Riemann Hypothesis over the last decade, as they have been doing for AI, then the Riemann Hypothesis surely would have been solved by now, right?” I dunno! Maybe! But not necessarily.
“Ah, but if the Riemann Hypothesis is that hard to solve, it must be because the proof is extraordinarily intricate and complicated, right?” I dunno! Maybe! But not necessarily. I think that lots of math proofs are elegant in hindsight, but took a lot of work to discover.
As another example, there was widespread confusion about causal inference for decades before Judea Pearl and others set us straight, with a simple and elegant framework.
So likewise, there can be a “simple(ish) core of intelligence” (§1.3 above) that is taking people a while to discover.
Of course, the strongest argument to me is the one in §1.3.1 above: the human cortex is an existence proof that there are important undiscovered insights in the world of learning algorithms.
1.4.2 Possible counter: “But LLMs will have already reached ASI before any other paradigm can even put its shoes on”
Well, I don’t think LLMs will scale to ASI. Not with multimodal data, not with RL from Verifiable Rewards post-training, not with scaffolding, not with anything else, not soon, not ever. That’s my belief, which I won’t argue for here. Seems like we’ll find out one way or the other quite soon.
(To be clear, I could be wrong, and certainly don’t want to discourage people from contingency-planning for the possibility that souped-up future LLM systems will scale to ASI.)
1.4.3 Possible counter: “If ASI will be part of a different paradigm, who cares? It’s just gonna be a different flavor of ML.”
I dispute the word “just”. Different ML algorithms can be quite different from each other!
I think the new paradigm will bring a shocking phase shift allowing dramatically more capabilities from dramatically less compute (see later sections), along with a shocking phase shift in the difficulty of technical alignment, including proneness to egregious scheming and deception (next post), as compared to current and future LLMs.
1.4.4 Possible counter: “If ASI will be part of a different paradigm, the new paradigm will be discovered by LLM agents, not humans, so this is just part of the continuous ‘AIs-doing-AI-R&D’ story like I’ve been saying”
I have two responses.
First, I disagree with that prediction. Granted, probably LLMs will be a helpful research tool involved in finding the new paradigm, but there have always been helpful research tools, like PyTorch and arXiv and Google, and I don’t expect LLMs to be in a fundamentally different category from those other helpful research tools.
Second, even if it’s true that LLMs will discover the new paradigm by themselves (or almost by themselves), I’m just not sure I even care. I see the pre-paradigm-shift AI world as a lesser problem, one that LLM-focused AI alignment researchers (i.e. the vast majority of them) are already focusing on. Good luck to them. And I want to talk about what happens in the crazy world that we enter after that paradigm shift.
1.5 Training compute requirements: Frighteningly little
We already know that different ML approaches can have different quantitative relationships between compute and performance. For example, Fig. 7 of the classic 2020 “Scaling Laws” paper shows perplexity scaling laws for LSTMs and transformers, and they do not overlay. I expect the next paradigm to be a very different learning algorithm, so the compute-versus-performance curves that we’re used to today are just irrelevant, from my perspective. After the new paradigm, all bets are off.
Instead, my guess (based largely on lots of opinions about exactly what computations the human brain is doing and how) is that human-level human-speed AGI will require not a data center, but rather something like one consumer gaming GPU—and not just for inference, but even for training from scratch.
So, whereas most people would say “Groups of humans can create $1B/year companies from scratch without any divine intervention, but groups of LLMs cannot create $1B/year companies from scratch without any human intervention. Welp, I guess we need even more training compute…”
…I would instead say “The latest LLMs are the wrong AI paradigm, but next-paradigm AI will be able to do things like that, starting from random initialization, with 1000× less training compute than was being used to train LLMs in 2022![8]
I won’t defend that here; see Thoughts on hardware / compute requirements for AGI for some of my thinking.
Instead, I’ll focus on how very low training compute feeds into many of my other beliefs.
1.6 Downstream consequences of “new paradigm with frighteningly little training compute”
1.6.1 I’m broadly pessimistic about existing efforts to delay AGI
I feel strongly that it would be better if AGI were invented later than sooner (other things equal, on the current margin), because I think we have a lot more work to do on technical alignment (among many other things), and we’re making progress but are nowhere near ready, and we need to be doing this work way ahead of time (§1.8.4 below).
…But I’m not sure that actual existing efforts towards delaying AGI are helping.
I think the path from here to AGI is bottlenecked by researchers playing with toy models, and publishing stuff on arXiv and GitHub. And I don’t think most existing public advocacy against building AGI will dissuade those researchers.
The problem is: public advocacy is way too centered on LLMs, from my perspective.[9] Thus, those researchers I mentioned, who are messing around with new paradigms on arXiv, are in a great position to twist “Pause AI” type public advocacy into support for what they’re doing!
“You don’t like LLMs?”, the non-LLM AGI capabilities researchers say to the Pause AI people, “Well how about that! I don’t like LLMs either! Clearly we are on the same team!”
This is not idle speculation—almost everyone that I can think of who is doing the most dangerous kind of AI capabilities research, the kind aiming to develop a new more-powerful-than-LLM AI paradigm, is already branding their work in a way that vibes with safety. For example, see here where I push back on someone using the word “controllability” to talk about his work advancing AI capabilities beyond the limits of LLMs. Ditto for “robustness” (example), “adaptability” (e.g. in the paper I was criticizing here), and even “interpretability” (details).
I think these people are generally sincere but mistaken, and I expect that, just as they have fooled themselves, they will also successfully fool their friends, their colleagues, and government regulators. Well, the government regulators hardly matter anyway, since regulating the activity of “playing with toy models, and publishing stuff on arXiv and GitHub” is a hell of an ask—I think it’s so unlikely to happen that it’s a waste of time to even talk about it, even if it were a good idea all-things-considered.[10]
(I think non-LLM-focused x-risk outreach and education is good and worthwhile. I expect it to be only slightly helpful for delaying AGI, but “slightly helpful” is still helpful, and more importantly outreach and education has many other good effects like bolstering safety research.)
1.6.2 I’m broadly pessimistic about existing efforts towards regulating AGI
Once the new paradigm is known and developed (see below), the actors able to train ASI from scratch will probably number in the tens of thousands, spread all around the world. We’re not just talking about five giant firms with gazillion-dollar data centers, as LLM-focused people tend to imagine.
Thus, for example, if governments know where all the giant data centers are and what code they’re running—well, I guess that’s probably better than governments not knowing that. But I think it’s only marginally helpful, in itself.
(That’s not to say that there is nothing useful happening in the space of regulating AGI. There are various things that would be slightly helpful,[11] and again, slightly helpful is still helpful.)
1.6.3 I expect that, almost as soon as we have AGI at all, we will have AGI that could survive indefinitely without humans
A classic x-risk argument says that ambitious callous AGIs would be motivated to wipe out humans in order to better accomplish their goals. And then a classic anti-x-risk counterargument replies that no, wiping out humans would be a murder-suicide, because there would be no one to run the electrical grid and chip factories etc. And while murder-suicide is a possible AGI motivation, it’s a less likely motivation than the AGI having long-term goals that benefit from its own survival.
Then what’s the pro-x-risk counter-counter-argument?
One approach is to tell a story that involves AGI maneuvering into power, then the world builds ever more chips and robots over a few decades, and then human extinction happens (more in “Response to Dileep George: AGI safety warrants planning ahead” §3.3.4 or this Carl Shulman interview).
…But what I really believe is that AGIs could wipe out humans and bootstrap their way back to running the world on their own, after very little prep work—see “What does it take to defend the world against out-of-control AGIs?” §3.3.3 for details. And this hypothesis starts seeming much more plausible if there are already enough chips lying around to run hundreds of millions of human-level human-speed AGIs. And that’s what I expect to be the case.
So again, this isn’t much of a crux for doom, but I still feel like it’s an important ingredient of the picture in my head.
1.7 Very little R&D separating “seemingly irrelevant” from ASI
I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains). How much is “very little”? I dunno, maybe 0–30 person-years of R&D? Contrast that with AI-2027’s estimate that crossing that gap will take millions of person-years of R&D.
Why am I expecting this? I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.
But here are a couple additional hints about where I’m coming from:
1.7.1 For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence
I’m definitely not saying that it will be easy to develop the future scary paradigm to ASI from scratch. Instead I’m talking about getting to ASI from the point where the paradigm has already crossed the threshold of being clearly relevant to AGI. (LLMs are already well past this threshold, but the future scary paradigm is obviously not.) In particular, this would be the stage where lots of people believe it’s a path to AGI in the very near future, where it’s being widely used for intellectual work, and/or it’s doing stuff clearly related to the Safe & Beneficial AGI problem, by creating visibly impressive and proto-AGI-ish useful artifacts.
It takes a lot of work to get past that threshold! Especially given the existence of LLMs. (That is: the next paradigm will struggle to get much attention, or make much money, until the next paradigm is doing things that LLMs can’t do—and LLMs can do a lot!)
Why do I think getting to “relevant at all” takes most of the work? This comes down to a key disanalogy between LLMs and brain-like AGI, one which I’ll discuss much more in the next post.
The power of LLMs comes almost entirely from imitation learning on human text. This leads to powerful capabilities quickly, but with a natural ceiling (i.e., existing human knowledge), beyond which it’s unclear how to make AI much better.
Brain-like AGI does not involve that kind of imitation learning (again, more in the next post). Granted, I expect brain-like AGI to also “learn from humans” in a loose sense, just as humans learn from other humans. But the details are profoundly different from the kind of imitation learning used by LLMs. For example, if Alice says something I don’t understand, I will be aware of that fact, and I’ll reply “huh?”. I won’t (usually) just start repeating what Alice says in that same context. Or if I do, this will not get me to any new capability that LLMs aren’t already covering much better. LLMs, after all, are virtuosos at simply repeating what they heard people say during pretraining, doing so with extraordinary nuance and contextual sensitivity.
As another suggestive example, kids growing up exposed to grammatical language will learn that language, but kids growing up not exposed to grammatical language will simply create a new grammatical language from scratch, as in Nicaraguan Sign Language and creoles. (Try training an LLM from random initialization, with zero tokens of grammatical language anywhere in its training data or prompt. It’s not gonna spontaneously emit grammatical language!) I think that’s a good illustration of why imitation learning is just entirely the wrong way to think about what’s going on with brain algorithms and brain-like AGI.
For brain-like AGI, all the potential blockers to ASI that I can imagine, would also be potential blockers for crossing that earlier threshold of being clearly relevant to AGI at all, a threshold that requires using language, performing meaningful intellectual work that LLMs can’t do, and so on.
Instead of imitation learning, a better analogy is to AlphaZero, in that the model starts from scratch and has to laboriously work its way up to human-level understanding. It can’t just regurgitate human-level understanding for free. And I think that, if it can climb up to human-level understanding, it can climb past human-level understanding too, with a trivial amount of extra R&D work and more training time—just as, by analogy, it takes a lot of work to get AlphaZero to the level of a skilled human, but then takes very little extra work to make it strongly superhuman.
And speaking of strongly superhuman:
1.7.2 “Plenty of room at the top”
The human brain algorithm has lots of room for capabilities improvement, including (1) more neurons, (2) speed, (3) motivation (e.g. intellectual curiosity, being interested in ideas and getting things done rather than status and gossip), (4) anything else that makes human geniuses tower over human midwits, but much more of it, (5) things like cloning, merging weight-updates from clones, high-bandwidth communication, etc. More at Response to Blake Richards: AGI, generality, alignment, & loss functions §3.2.
1.7.3 What’s the rate-limiter?
One Paul-Christiano-style counterargument (cf. his post “Takeoff speeds”) would be: “All those things you listed under ‘plenty of room at the top’ above for why AGIs can outperform humans—scale, speed, cloning, etc.—are things that could happen before, not after, human-level, making up for some other deficiency, as opposed to your implied suggestion that we’ll get to human-level in a human-brain-like way first, and only then rapidly scale, speed it up, clone many copies, etc.”
My rebuttal is: for a smooth-takeoff view, there has to be some correspondingly-slow-to-remove bottleneck that limits the rate of progress. In other words, you can say “If Ingredient X is an easy huge source of AGI competence, then it won’t be the rate-limiter, instead something else will be”. But you can’t say that about every ingredient! There has to be a “something else” which is an actual rate-limiter, that doesn’t prevent the paradigm from doing impressive things clearly on track towards AGI, but that does prevent it from being ASI, even after hundreds of person-years of experimentation.[13] And I’m just not seeing what that could be.
Another point is: once people basically understand how the human brain figures things out in broad outline, there will be a “neuroscience overhang” of 100,000 papers about how the brain works in excruciating detail, and (I claim) it will rapidly become straightforward to understand and integrate all the little tricks that the brain uses into AI, if people get stuck on anything.
1.8 Downstream consequences of “very little R&D separating ‘seemingly irrelevant’ from ‘ASI’”
1.8.1 Very sharp takeoff in wall-clock time
I wind up feeling like the wall-clock time between the new paradigm being “seemingly irrelevant to AGI” and ASI is, I dunno, two years on the high side, and zero on the low side.
Specifically, on the low side, I wouldn’t rule out the possibility that a single training run is the first to surpass both the “clearly relevant to AGI” threshold and the ASI threshold, in which case they would happen basically simultaneously (perhaps within the same week).
To be clear, the resulting ASI after those 0–2 years would not be an AI that already knows everything about everything. AGI and ASI (in my opinion) aren’t about already knowing things, but rather they’re about not knowing things, yet being able to autonomously figure them out (§1.7.1 above). So the thing we get after the 0–2 years is an AI that knows a lot about a lot, and if it wants to dive deeper into some domain, it can do so, picking it up with far more speed, depth, and insight than any human could.
Think of an army of a million super-speed telepathic scaled-up John von Neumann clones. If you ask them some question about cryptocurrency, then maybe they won’t know the answer off the top of their head, because maybe it happens that there wasn’t any information about cryptocurrency in their training environment to date. But then they’ll go spend a day of wall-clock time (≈ months or years of subjective time) reading up on cryptocurrency and all its prerequisites, and playing with the code, and so on, and then they’ll have a deep, beyond-world-expert-level understanding.
1.8.1.1 But what about training time?
Even if the next paradigm requires very few person-years of R&D to get from “clearly relevant to ASI” to “actual ASI”, it may take a long time if the individual training runs are slow. But I don’t think that will be much of a limiter.
Instead, I expect that the next paradigm will involve so little compute, and be so amenable to parallelization, that trainings from “birth” (random initialization) to adult-human-level will take maybe a couple weeks, notwithstanding the fact that human brains require decades.[14] And I think picking the low-hanging fruit of efficiency and parallelization will happen early on, probably during the earlier “seemingly irrelevant” stage—why would anyone ever run a year-long training, when they can instead spend a few months accelerating and parallelizing the algorithm, and then run the same training much faster?
1.8.1.2 But what if we try to make takeoff smoother?
The wall-clock time for takeoff depends in part on people making decisions, and people could decide to go actually slowly and incrementally. Even in the “single training run” case, heck, in principle, that training run could happen over the course of a zillion years, with gradient descent being performed by one guy with an abacus. But given how little compute and R&D are involved in getting to ASI, I think the only way to get deliberate slowdown would involve excellent secrecy on the algorithms, and one group (or consortium) way in the lead, and then this one group “burns their lead” in order to do incremental testing and other safety interventions.[15]
We should keep possibilities like that in mind. But I see it as realistically making takeoff smoother by months, at best, not years.
1.8.2 Sharp takeoff even without recursive self-improvement
As mentioned above, some LLM-focused people like the AI-2027 authors agree with me about takeoff being pretty sharp, with the world radically changing over the course of months rather than years. But they get that conclusion via a very different path than I do.
Recall from Bostrom (2014) the (qualitative) formula:
The LLM-focused people get fast “rate of change of intelligence” under an assumption that “recalcitrance” (difficulty of improving AI) is high and steeply increasing, but the “optimization power” brought to bear on improving AI is even higher, and even more steeply increasing.
Whereas I think we’re in the wrong paradigm today, but when that changes, recalcitrance will be quite low, at least across the range from “doing anything impressive whatsoever” to ASI. So we’ll get sharp takeoff (across that range) even without any particular increase in optimization power being applied to AI research.
1.8.2.1 …But recursive self-improvement could also happen
Of course, somewhere between “doing anything impressive whatsoever” and ASI, we’ll get AIs that can do excellent AI capabilities research. And that could make takeoff faster still.[16] But I don’t think that would change my general picture very much; it would just shorten this already-short period a bit further, by effectively clipping off the end of it.
This is an area where I kinda disagree with not just Paul Christiano but also Eliezer, who historically has seemed to put a lot of emphasis on the ability of AI to do excellent AI R&D. I think where Eliezer was coming from (see e.g. Intelligence Explosion Microeconomics (2013) p56) was: human brains are comically inefficient (in his view), and human institutions even more so, and thus AI is going to be much better than humans at AI R&D, leading to rapid self-improvement. Whereas I think that’s kinda missing the point, because by the time AI is already that good at AI R&D, we’re already after the critical and controversial part. Remember the “simple(ish) core of intelligence” in §1.3 above—I think AI will get that good at AI R&D via a kind of competence that generalizes into every other domain too.
In other words, I think that, if you understand the secret sauce of the human brain, then you straightforwardly and quickly get to a ASI at the level of a million super-speed telepathic scaled-up John von Neumann clones. Then Eliezer would respond: “Ah, but then that super John von Neumann clone army would be able to do some kick-ass AI research to make their algorithms even more powerful still!” And, yeah! That’s true! But by that point, does it even matter?
1.8.3 Next-paradigm AI probably won’t be “deployed” at all, and ASI will probably show up in a world not wildly different from today’s
A lot of things seem to point in that direction, including:
As above, I expect that, within 2 years (or much less) before ASI, the next-paradigm AIs will not be able to appreciably help in any intellectual work of note, and to be generally unimpressive, little-known, little-used, and incapable of doing anything at all that superficially rings of AGI (i.e., far less impressive and useful than LLMs today);
The compute requirements will be so low that there will be little pressure to get immediate revenue—the R&D won’t be bottlenecked on getting new investments to buy or rent ever more compute;
Any engineer working towards deploying the current best model could instead be spending her time on making the model work much better—a project which will have a ton of very obvious low-hanging fruit, and be amenable to cheap parallel experimentation;
There may be safety or ethical[17] concerns delaying the deployment of these new-paradigm AIs;
Or more generally, it might just take some time while people are studying and getting a sense for these AIs, before offering them as a widespread service. (And even longer before many people start using this new service.)
Indeed, I’m not even sure if there will be much “internal deployment” to speak of, for the same reasons. I think ASI may well arrive before the developers have really gotten past the stage of testing and exploration.
So I think the Eliezer-ish scenario where strong superintelligence escapes onto the internet, in a world otherwise much like today, is quite plausible, and is my central expectation right now.
Of course, the future world won’t be exactly like today. It will presumably have more and better chips. It will have better, cheaper, and far more widespread LLMs, and people will take them for granted, complain about them, and/or forget what life was like before them, just as we do now for cell phones and social media. The already-ongoing semantic bleaching of the terms “AGI” and “ASI” will continue, until the terms become just meaningless AI company marketing speak. Various things will happen in geopolitics. Perhaps some early version of next-paradigm-AI will be getting used profitably in e.g. the robotics sector.
…But nothing like the kind of obvious common-knowledge pre-ASI craziness envisioned in Paul-style smooth-takeoff scenarios (e.g. “There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles.”).
1.8.4 We better sort out technical alignment, sandbox test protocols, etc., before the new paradigm seems even relevant at all, let alone scary
Needless to say, if I’m right, then we need to be doing serious prep work for this next-paradigm AI, even while this next-paradigm AI is obscure, seemingly irrelevant, and only good for running toy-model demos or unsexy niche applications. Or maybe before they’re even good for any of that!
Luckily, if the next paradigm is brain-like AGI, as I expect, then we can study brains right now, and thus have at least something to go on in understanding the nature of the threat and what to do about it. That’s of course what I’m working on myself.
1.8.5 AI-assisted alignment research seems pretty doomed
The obvious, well-known problem with AI-assisted alignment research is the chicken-and-egg problem. Unaligned AIs won’t actually care about robustly solving the alignment problem. So at best, the AIs will care only about impressing us—and we have abundant empirical evidence that people can be impressed by incorrect alignment ideas. At worst, the AIs will be trying to deceive and manipulate us. See further discussion in §4 of my post “Reward Button Alignment”.
But in the context of this post, we have an additional problem on top of that: I expect that, once the next-paradigm AIs are competent enough to meaningfully contribute to alignment research at all, they will be very easily able to invent ASI. Inventing ASI will be (at that point) much, much easier than alignment research—the former will entail just a bit more iteration and putting pieces together (since we’ll already be almost there!), whereas the latter will entail tricky conceptual work, anticipating novel problems, and out-of-the-box thinking.
(I’m talking specifically about getting help from AI-of-the-next-paradigm. A different topic is getting help from LLMs. I’m all for getting help from LLMs where possible! But as I mentioned in §1.4.4 above, I expect that the role of LLMs is, and will continue to be, as a mundane productivity enhancer in the same bucket as Integrated Development Environments (IDEs), PyTorch, arXiv, google, etc.,[18] as opposed to an autonomous researcher akin to humans. I just don’t think they’ll get that good.)
1.8.6 The rest of “AI for AI safety” seems pretty doomed too
@Joe Carlsmith’s “AI for AI safety” brings up three categories of things to do with AI to make the ASI risk situation better:
Safety progress: our ability to develop new levels of AI capability safely,
Risk evaluation: our ability to track and forecast the level of risk that a given sort of AI capability development involves, and
Capability restraint: our ability to steer and restrain AI capability development when doing so is necessary for maintaining safety.
I don’t really see any of these things working, at least not in the form that Joe and the other “AI Control” people seem to be imagining. Takeoff, sharp as it is, will get very much sharper still if word gets out about how this kind of AI works, and then there’d be no time to get anything done. (And “capability restraint” via governance would be off the table, given how little compute is required, see §1.5–§1.6 above.) Or if things stay mum, then that rules out public risk evaluations, widespread alignment research, or most kinds of AI-assisted societal resilience.
Moreover, the continuous learning nature of the future paradigm (see §1 of “Sharp Left Turn” discourse: An opinionated review) would mean that “AI capabilities” are hard to pin down through capabilities elicitation—the AI might not understand something when you test it, but then later it could figure it out.
(See also §2.6 of the next post on further challenges of weaker AIs supervising stronger AIs.)
Instead, the only forms of “AI for AI safety” that seem plausible to me are much closer to what Eliezer and others were talking about a decade ago: (1) “pivotal acts” (which, as Scott Alexander points out, will feel quite different if people actually find themselves living inside the scenario that I expect), and (2) very powerful AIs with good motivations, not straightforwardly following human instructions, but rather doing what they think is best. I won’t justify that in detail; it’s out of scope.
1.8.7 Decisive Strategic Advantage (DSA) seems hard to avoid
An AI with a DSA is one that could unilaterally crush or co-opt all competition, should it choose to. This would constitute a terrifying single-point-of-failure for the whole future. Thus, some people understandably wonder whether we could just, y’know, not have that happen. For example, @Joe Carlsmith’s On “first critical tries” in AI alignment: “I think we should try to make it the case that no AI system is ever in a position to kill everyone and take over the world.”
I’ll leave aside the question of whether DSAs are bad—wait, sorry, they’re definitely bad. But maybe every option is bad, in which case we would have to figure out which option is least bad.[19] Anyway, my goal in this subsection is to argue that, assuming we want to avoid a DSA, I don’t see any way to do that.
A useful notion (after Eliezer via Paul Christiano) is “free energy”, meaning unexploited opportunities that an AI might use to gain power and influence. It includes profitable opportunities that have not yet been taken. It includes chips that have neither been already hacked into, nor secured, nor had their rental price massively bid upwards. It includes brainwashable humans who have neither been already brainwashed, nor been defended against further brainwashing. Things like that.
Free energy depends on competence: the very same environment may have no free energy for a human, nor for a midwit AI, but tons of free energy for a superintelligent AI.
(Free energy also depends on motivation: an opportunity to extort humans by threatening a bioweapon would constitute “free energy” for an AI that doesn’t care about human welfare or norms, but not for an AI that does. But I’ll put that aside—that gets into offense-defense balance and other issues outside the scope of this series.)
Anyway, Paul Christiano suggests that “aligned AI systems can reduce the period of risk of an unaligned AI by … consuming the ‘free energy’ that an unaligned AI might have used to grow explosively.”
Well, my concern is that when this next paradigm goes from “basically useless” to “million super-speed scaled-up telepathic John von Neumanns” in two years, or maybe much less than two years, there’s just an extraordinary amount of free energy appearing on the scene, very fast. It’s like a Mount-Everest-sized pile of gunpowder that will definitely be consumed within a matter of months. It’s pleasant to imagine this happening via a very distributed and controlled gradual burn. But c’mon. There’s gonna be a massive explosion.
Like, suppose I’m wrong about blasting through human level, and instead we get midwit AGIs for five years, and they get deployed in a widespread, distributed way on chips around the world. Does that use up the free energy? No, because the million-John-von-Neumann ASI is still going to come along after that, and wherever it shows up, it can (if it chooses to) crush or outbid all the midwit AGIs, make crazy nanotech stuff, etc.
Ah, but what if there are not two but three steps from world-very-much-like-today to ASI? Midwit AI for a couple years, then genius AI for a couple years, then million-super-speed-John-von-Neumann ASI after that? Then I claim that at least one of those three steps will unlock an extraordinary amount of free energy, enough to easily crush everything that came before and grab unprecedented power. Ah, but what if it’s five steps instead of three? Ditto. The amount of gradualism necessary to fundamentally change this dynamic is far more gradual than I see as plausible. (Again, my central guess is that there will be no deployment at all before ASI.)
Ah, but what if we ban closed-source AI? Nope, I don’t think it helps. For one thing, that will just make takeoff even sharper in wall-clock time. For another thing, I don’t think that’s realistically enforceable, in this context where a small group with a few chips can put the pieces together into a system of vastly greater competence. For yet another thing, I think there are first-mover advantages, and an unstable dynamic in which “power begets power” for these future AIs. For example, the first AI to steal some chips will have extra competence with which to go after more chips—recall the zombie apocalypse movies, where ever more zombies can create ever more zombies. (Except that here, the zombies are superhumanly ambitious, entrepreneurial, patient, etc.) Or they can use the extra compute to self-improve in other ways, or subvert competition.
Ah, but what if some AI safely detonates the free energy by making the world resilient against other powerful AIs—e.g. it autonomously hacks into every data center on Earth, hardens the security (or just destroys the chips!), maybe deploys a “gray goo defense system” or whatever, and then deletes itself? Well, that same AI clearly had a DSA! It’s just that it didn’t use its extraordinary power to install itself as a permanent Singleton—cf. “AI nanny” or “pivotal act”. By the same token, one could imagine good outcomes like an AI that sets up a “long reflection” and defers to the results, shutting itself down when appropriate. Or an AI could gather power and hand it over to some particular human or institution. Many possibilities. But they still involve some AI having a DSA at some point. So they still involve a giant terrifying single point of failure.
1.9 Timelines
I don’t know when the next paradigm will arrive, and nobody else does either. I tend to say things like “probably 5 to 25 years”. But who knows! For what it’s worth, here are some thoughts related to why I picked those numbers:
For long-timeline readers who think “probably 5-25 years” is too low:
I don’t think 2030 is too soon to strongly rule out ASI. A lot can happen in five years. Five years is how long it took to get from “LLMs don’t even exist at all” in 2018 to GPT-4 in 2023. And that’s an under-estimation of how fast things can move. The path from 2018 to GPT-4 involved a number of bottlenecks that the next paradigm won’t—particularly building huge data centers and training up a huge pool of experts in machine learning, parallelization, hardware acceleration and so on.
If we go a bit further, the entirety of deep learning was a backwater as recently as 2012, a mere 13 years ago.
A different argument goes: “the brain is so ridiculously complicated, and we’re so far from reverse-engineering it, that brain-like AGI could very well take much longer than 25 years”. For my response to that school of thought, see Intro to Brain-Like-AGI Safety §2.8, §3.7, and §3.8. To be clear, it could be more than 25 years. Technological forecasting is very hard. Can’t rule anything out. What do I know?
For short-timeline readers who think “probably 5-25 years” is too high:
I don’t think 2050 is so far away that we can confidently rule out that ASI will take that long. See discussion in §1.4.1 above.
I’m also skeptical that people will get there in under 5 years, just based on my own inside view of where people are at right now and the pace of recent progress. But again, who knows? I don’t rule anything out.
1.9.1 Downstream consequences of timelines
A lot of people seem to believe that either LLMs will scale to AGI within the next couple years, or this whole AGI thing is stupid hype.
That’s just so insane to me. If AGI is 25 years away (for the sake of argument), that still obviously warrants urgent planning right now. People routinely plan that far out in every other domain—climate change, building infrastructure, investing in personal health, saving for retirement, etc.
For example, if AGI is 25 years away, then, in my estimation, I’m much more likely to die from ASI apocalypse than from all other causes combined. And I’m not even that young! This is a real thing coming up, not a far-off abstract fantasy-land scenario.
Other than that, I don’t think it’s terribly decision-relevant whether we get ASI in 5 years versus 25 years, and accordingly I don’t spend much time thinking about it. We should obviously be contingency-planning for both.
1.10 Conclusion
Now you know the kind of “foom” I’m expecting: the development of strong superintelligence from a small group working on a new AI paradigm, with essentially no warning and little resources, and leaving us with meagre hope to constrain this radical transition via conventional balance-of-power or governance mechanisms, and very little opportunity to test and iterate on any system remotely similar to the future scary ones.
So we need to be working frantically on technical alignment, sandbox test protocols, and more generally having a plan, right now, long before the future scary paradigm seems obviously on the path to AGI.
(And no, inventing that next AI paradigm is not part of the solution, but rather part of the problem, despite the safety-vibed rhetoric of the researchers who are doing exactly that as we speak—see §1.6.1.)
I am very unhappy to hold that belief, and it’s an unpopular belief in the era of LLMs, but I still think it’s true.
If that’s not bad enough, the next post will argue that, absent some future conceptual breakthrough, this kind of AI will be egregiously misaligned, deceptive, and indifferent to whether its users, programmers, or anyone else lives or dies. Next post: doom!
Thanks Charlie Steiner, Ishaan Koratkar, Seth Herd, and Justis Mills for critical comments on earlier drafts.
- ^
For example, (1) On the foom side, Paul Christiano brings up Eliezer Yudkowsky’s past expectation that ASI “would likely emerge from a small group rather than a large industry” as evidence against Eliezer’s judgment and expertise here [disagreement 12] and as “improbable and crazy” here. (2) On the doom side, the “literal genie” / “monkey’s paw” thing, where an AI would follow a specification literally, with catastrophic consequences, as opposed to interpreting natural-language requests with common sense, has likewise largely shifted from a doomer talking point to an anti-doomer mocking point. But I still believe in both those things—see §1.7 and §2.4 respectively.
- ^
“LLM” means “Large Language Model”. I’m using it as a synonym for a big class of things, also called “foundation models”, that often include multi-modal capabilities, post-training, tool use, scaffolding, and so on.
- ^
For example, this category includes pretty much everyone at OpenAI, Anthropic, DeepMind, OpenPhil, GovAI, CSET, the AISI’s, and on and on.
As another example, I just randomly opened up Alignment Forum, and had to scroll through 20 posts before I found even one that was not related to the alignment properties of today’s LLMs, or otherwise premised on LLMs scaling continuously to ASI.
More broadly, it’s increasingly common in the discourse for people to simply equate “AI” with “LLMs” (as if no other type of AI exists?), and to equate “ASI” with “ASI before 2030 via pure scaling of LLMs” (as if 2040 or 2050 were a distant abstract fantasy-land?). This leads to an endless fountain of bad takes from all sides, which I frequently complain about (1, 2, 3, 4, …).
- ^
- ^
…in conjunction with the thalamus, basal ganglia, etc.
- ^
Someone still needs to do R&D for the hardware side of robotics, but not much! Indeed, teleoperated robots seem to be quite capable and inexpensive already today, despite very low demand.
- ^
Could nuclear chain reactions have happened many years earlier? The obvious answer is no: they were bottlenecked by advances in nuclear physics. Ah, but what if we lump together the nuclear chain reactions with all the supporting theory, and ask why that whole package couldn’t have happened many years earlier? But more to the point, if a historical lack of understanding of nuclear physics was a bottleneck delaying nuclear chain reactions, isn’t it likewise possible that a current lack of understanding of [????] is a bottleneck delaying that next AI paradigm today?
- ^
The training of GPT-4 used 2e25 FLOP (source: Epoch), and it probably happened mostly during 2022.
- ^
I imagine public advocates responding by saying something like:
Well, we could remove LLMs from the narrative, and talk in more general terms about how AGI / ASI is some future technology, to be invented at some future date, and here’s why it’s dangerous and why we should urgently prepare for it right now via safety research, institution building, etc. Indeed, we x-risk people were saying exactly that message 10 years ago, and we were saying it 20 years ago, and we were saying it all the way back to Alan Turing 75 years ago. And nobody gave a shit! The vast majority of people, even AI experts, only started paying the slightest attention to AI x-risk when the message changed to: ‘Y’know, those LLMs, the ones that you can see with your own eyes? We’re talking about those. Or maybe, at most, the next generation of those, which are already being built.’. And that message—man, it’s not even our message! It’s a mutant cousin of our message, which, being far more memetically fit, drowned out our actual more nuanced message in the popular discourse.
And … yeah, sigh, I dunno.
- ^
You can’t put nuclear secrets on arXiv, but I find it hard to imagine AI toy model papers ever winding up in that category, even if it were objectively a good idea. See also the time that the USA put export restrictions on an algorithm; not only did the restrictions utterly fail to prevent proliferation, but they were also struck down as unconstitutional!
- ^
Other examples of probably-helpful-on-the-margin governance work: (1) it would be nice if governments would publicly announce that AI companies can collaborate for safety reasons without falling afoul of antitrust law; (2) maybe something about liability, e.g. this idea? No strong opinions, I haven’t thought about it much.
- ^
Things that qualify as “impressive and proto-AGI-ish” would include helping with AI alignment research, or AI capabilities research, or bioweapons research, or unlocking huge new commercial opportunities, or even just being “visibly intelligent”. LLMs (unlike next-paradigm AIs) are already well into the “impressive and proto-AGI-ish” stage, which by the way is a much lower bar than what Redwood Research people call “transformatively useful AI”.
An important aspect is the question of whether there’s widespread belief that this paradigm is a path to AGI, versus whether it’s just another exploratory subfield of AI. As an analogy, think of probabilistic programming today—it beats a few benchmarks, and it has a few niche commercial applications, and it has some enthusiastic boosters, but mostly nobody cares. (No offense!) My claim is that, very shortly before ASI (in terms of both wall-clock time and R&D effort), the algorithms that will develop into ASI will be similarly niche. That could be true even if the algorithms have some profitable commercial applications in robotics or whatever.
- ^
Or I suppose the rate-limiter could be that there are 10,000 “something else”s; but see discussion of “simple(ish) core of intelligence” in §1.3 above.
- ^
I’m assuming 100+-fold speedup compared to humans from a mix of serial speedup, parallelization (see discussion of “parallel experiences” here), and various human inefficiencies (relative to our goals with AGI). By the way, I mentioned in §1.5 that I think training-from-scratch will be possible with extraordinarily little compute, like a single consumer GPU—and if a single consumer GPU is really all that a researcher had, then maybe training-from-scratch would take many months. But what I actually expect is that researchers will at least be using ten H100s or whatever for their training runs, which is far more powerful, while still being very inexpensive, widely available, and all-but-impossible to track or govern.
- ^
I’m stating a possibility, not saying that I expect people to actually do this. As the doomer refrain goes: “I do not expect us to die with that much dignity.” See also: “Aligning an AGI adds significant development time” (which I mostly agree with).
- ^
I say “could” instead of “will” because it’s at least conceivable that humans will remain in control and choose to not have AIs work on AI capabilities research.
- ^
I expect the future-scary-paradigm AIs to have a pretty obvious (and IMO legitimate) claim to phenomenal consciousness and moral patienthood, much more than LLMs do, thanks to the future scary AIs operating on human-brain-like algorithmic principles. Of course, I don’t know whether future developers will notice or care, and if they do, I don’t know how they’ll act on it. But still, I think the general dismissal of LLM welfare today (pace Anthropic hiring one guy to think about it) is not necessarily indicative of what will happen with the next paradigm.
- ^
For the record, a poll of my X followers says that LLMs are a bigger boon to programming than IDEs, although a substantial minority disagreed. Note the obvious caveats that future LLMs will be better than today’s LLMs and that some of my X followers may not be skilled users of LLMs (or IDEs, for that matter).
- ^
E.g. Michael Nielsen’s ASI existential risk: reconsidering alignment as a goal emphasizes that multipolar AI scenarios may lead to doom via unsolvable coordination problems related to destructive technologies, related to Vulnerable World Hypothesis. That seems bad! But the DSA thing seems bad too! Again, I’m not taking a stand here, just trying to understand the situation.
- Foom & Doom 2: Technical alignment is hard by 23 Jun 2025 17:19 UTC; 152 points) (
- AI #123: Moratorium Moratorium by 3 Jul 2025 15:40 UTC; 33 points) (
- If your AGI definition excludes most humans, it sucks. by 22 Jul 2025 10:33 UTC; 18 points) (
- We should think about the pivotal act again. Here’s a better version of it. by 28 Aug 2025 9:29 UTC; 11 points) (
- 24 Aug 2025 17:25 UTC; 9 points) 's comment on Buck’s Shortform by (
- AI-202X: a game between humans and AGIs aligned to different futures? by 1 Jul 2025 23:37 UTC; 2 points) (
- 1 Aug 2025 1:00 UTC; 1 point) 's comment on I am worried about near-term non-LLM AI developments by (
In this comment, I’ll try to respond at the object level arguing for why I expect slower takeoff than “brain in a box in a basement”. I’d also be down to try to do a dialogue/discussion at some point.
I think the way you describe this argument isn’t quite right. (More precisely, I think the argument you give may also be a (weaker) counterargument that people sometimes say, but I think there is a nearby argument which is much stronger.)
Here’s how I would put this:
Prior to having a complete version of this much more powerful AI paradigm, you’ll first have a weaker version of this paradigm (e.g. you haven’t figured out the most efficient way to do the brain algorithmic etc). Further, the weaker version of this paradigm might initially be used in combination with LLMs (or other techniques) such that it (somewhat continuously) integrates into the old trends. Of course, large paradigm shifts might cause things to proceed substantially faster or bend the trend, but not necessarily.
Further, we should still broadly expect this new paradigm will itself take a reasonable amount of time to transition through the human range and though different levels of usefulness even if it’s very different from LLM-like approaches (or other AI tech). And we should expect this probably happens at massive computational scale where it will first be viable given some level of algorithmic progress (though this depends on the relative difficulty of scaling things up versus improving the algorithms). As in, more than a year prior to the point where you can train a superintelligence on a gaming GPU, I expect someone will train a system which can automate big chunks of AI R&D using a much bigger cluster.
On this prior point, it’s worth noting that of the Paul’s original points in Takeoff Speeds are totally applicable to non-LLM paradigms as is much in Yudkowsky and Christiano discuss “Takeoff Speeds”. (And I don’t think you compellingly respond to these arguments.)
I think your response is that you argue against these perspectives under ‘Very little R&D separating “seemingly irrelevant” from ASI’. But, I just don’t find these specific arguments very compelling. (Maybe also you’d say that you’re just trying to lay out your views rather than compellingly arguing for them. Or maybe you’d say that you can’t argue for your views due to infohazard/forkhazard concerns. In which case, fair enough.) Going through each of these:
I don’t buy that having a “simple(ish) core of intelligence” means that you don’t take a long time to get the resulting algorithms. I’d say that much of modern LLMs does have a simple core and you could transmit this using a short 30 page guide, but nonetheless, it took many years of R&D to reach where we are now. Also, I’d note that the brain seems way more complex than LLMs to me!
My main response would be that basically all paradigms allow for mixing imitation with reinforcement learning. And, it might be possible to mix the new paradigm with LLMs which would smooth out / slow down takeoff.
You note that imitation learning is possible for brains, but don’t explain why we won’t be able to mix the brain like paradigm with more imitation than human brains do which would smooth out takeoff. As in, yes human brains doesn’t use as much imitation as LLMs, but they would probably perform better if you modified the algorthm some and did do 10^26 FLOP worth of imitation on the best data. This would smooth out the takeoff.
I’ll consider responding to this in a comment responding to the next post.
Edit: it looks like this is just the argument that LLM capabilities come from imitation due to transforming observations into behavior in a way humans don’t. I basically just think that you could also leverage imitation more effectively to get performance earlier (and thus at a lower level) with an early version of more brain like architecture and I expect people would do this in practice to see earlier returns (even if the brain doesn’t do this).
Noteably, in the domains of chess and go it actually took many years to make it through the human range. And, it was possible to leverage imitation learning and human heuristics to perform quite well at Go (and chess) in practice, up to systems which weren’t that much worse than humans.
AlphaZero exhibits returns which are maybe like 2-4 SD (within the human distribution of Go players supposing ~100k to 1 million Go players) per 10x-ing of compute.[1] So, I’d say it probably would take around 30x to 300x additional compute to go from skilled human (perhaps 2 SD above median) to strongly superhuman (perhaps 3 SD above the best human or 7.5 SD above median) if you properly adapted to each compute level. In some ways 30x to 300x is very small, but also 30x to 300x is not that small...
In practice, I expect returns more like 1.2 SD / 10x of compute at the point when AIs are matching top humans. (I explain this in a future post.)
I agree with this.
I’d say that the rate limiter is that it will take a while to transition from something like “1000x less compute efficient than the human brain (as in, it will take 1000x more compute than human lifetime to match top human experts but simultaneously the AIs will be better at a bunch of specific tasks)” to “as compute efficient as the human brain”. Like, the actual algorithmic progress for this will take a while and I don’t buy your claim that that way this will work is that you’ll go from nothing to having an outline of how the brain works and at this point everything will immediately come together due to the neuroscience literature. Like, I think something like this is possible, but unlikely (especially prior to having AIs that can automate AI R&D).
And, while you have much less efficient algorithms, you’re reasonably likely to get bottlenecked on either how fast you can scale up compute (though this is still pretty fast, especially if all those big datacenters for training LLMs are still just lying around around!) or how fast humanity can produce more compute (which can be much slower).
Part of my disagreement is that I don’t put the majority of the probability on “brain-like AGI” (even if we condition on something very different from LLMs) but this doesn’t explain all of the disagreement.
It looks like a version of AlphaGo Zero goes from 2400 ELO (around 1000th best human) to 4000 ELO (somewhat better than the best human) between hours 15 to 40 of training run (see Figure 3 in this PDF). So, naively this is a bit less than 3x compute for maybe 1.9 SDs (supposing that the “field” of Go players has around 100k to 1 million players) implying that 10x compute would get you closer to 4 SDs. However, in practice, progress around the human range was slower than 4 SDs/OOM would predict. Also, comparing times to reach particular performances within a training run can sometimes make progress look misleadingly fast due to LR decay and suboptimal model size. The final version of AlphaGo Zero used a bigger model size and ran RL for much longer, and it seemingly took more compute to reach the ~2400 ELO and ~4000 ELO which is some evidence for optimal model size making a substantial difference (see Figure 6 in the PDF). Also, my guess based on circumstantial evidence is that the original version of AlphaGo (which was initialized with imitation) moved through the human range substantially slower than 4 SDs/OOMs. Perhaps someone can confirm this. (This footnote is copied from a forthcoming post of mine.)
A supporting argument: Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn’t expect the human brain algorithm to be an all-or-nothing affair. (Unless it’s so simple that evolution could find it in ~one step, but that seems implausible.)
Edit: Though in principle, there could still be a heavy-tailed distribution of how useful each innovation is, with one innovation producing most of the total value. (Even though the steps leading up to that were individually slightly useful.) So this is not a knock-down argument.
If humans are looking at parts of the human brain, and copying it, then it’s quite possible that the last component we look at is the critical piece that nothing else works without. A modern steam engine was developed step by step from simpler and cruder machines. But if you take apart a modern steam engine, and copy each piece, it’s likely that it won’t work at all until you add the final piece, depending on the order you recreate pieces in.
It’s also possible that rat brains have all the fundamental insights. To get from rats to humans, evolution needed to produce lots of genetic code that grew extra blood vessels to supply the oxygen and that prevented brain cancer. (Also, evolution needed to spend time on alignment) A human researcher can just change one number, and maybe buy some more GPU’s.
My claim was “I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains).”
I don’t think anything about human brains and their evolution cuts against this claim.
If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said. There are lots of parts of the human brain that are doing essential-for-AGI stuff, but if they’re not in place, then you also fail to pass the earlier threshold of “impressive and proto-AGI-ish”, e.g. by doing things that LLMs (and other existing techniques) cannot already do.
Or maybe your argument is “brain-like AGI will involve lots of useful components, and we can graft those components onto LLMs”? If so, I’m skeptical. I think the cortex is the secret sauce, and the other components are either irrelevant for LLMs, or things that LLM capabilities researchers already know about. For example, the brain has negative feedback loops, and the brain has TD learning, and the brain has supervised learning and self-supervised learning, etc., but LLM capabilities researchers already know about all those things, and are already using them to the extent that they are useful.
To be clear: I’m not sure that my “supporting argument” above addressed an objection to Ryan that you had. It’s plausible that your objections were elsewhere.
But I’ll respond with my view.
Ok, so this describes a story where there’s a lot of work to get proto-AGI and then not very much work to get superintelligence from there. But I don’t understand what’s the argument for thinking this is the case vs. thinking that there’s a lot of work to get proto-AGI and then also a lot of work to get superintelligence from there.
Going through your arguments in section 1.7:
“I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.”
But I think what you wrote about the simple(ish) core of intelligence in 1.3 is compatible with there being like (making up a number) 20 different innovations involved in how the brain operates, each of which gets you a somewhat smarter AI, each of which could be individually difficult to figure out. So maybe you get a few, you have proto-AGI, and then it takes a lot of work to get the rest.
Certainly the genome is large enough to fit 20 things.
I’m not sure if the “6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on” is complex enough to encompass 20 different innovations. Certainly seems like it should be complex enough to encompass 6.
(My argument above was that we shouldn’t expect the brain to run an algorithm that only is useful once you have 20 hypothetical components in place, and does nothing beforehand. Because it was found via local search, so each of the 20 things should be useful on their own.)
“Plenty of room at the top” — I agree.
“What’s the rate limiter?” — The rate limiter would be to come up with the thinking and experimenting needed to find the hypothesized 20 different innovations mentioned above. (What would you get if you only had some of the innovations? Maybe AGI that’s incredibly expensive. Or AGIs similarly capable as unskilled humans.)
“For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence”
I agree that there are reasons to expect imitation learning to plateau around human-level that don’t apply to fully non-imitation learning.
That said...
For some of the same reasons that “imitation learning” plateaus around human level, you might also expect “the thing that humans do when they learn from other humans” (whether you want to call that “imitation learning” or “predictive learning” or something else) to slow down skill-acquisition around human level.
There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).
Thanks! Here’s a partial response, as I mull it over.
See “Brain complexity is easy to overstate” section here.
As in the §2.3.2, if an LLM sees output X in context Y during pretraining, it will automatically start outputting X in context Y. Whereas if smart human Alice hears Bob say X in context Y, Alice will not necessarily start saying X in context Y. Instead she might say “Huh? Wtf are you talking about Bob?”
Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?
(If Alice is able says to herself “in this situation, Bob would say X”, then she has a shoulder-Bob, and that’s definitely a benefit not a cost. But that’s predictive learning, not imitative learning. No question that predictive learning is helpful. That’s not what I’m talking about.)
…So there’s my intuitive argument that the next paradigm would be hindered rather than helped by mixing in some imitative learning. (Or I guess more precisely, as long as imitative learning is part of the mix, I expect the result to be no better than LLMs, and probably worse. And as long as we’re in “no better than LLM” territory, I’m off the hook, because I’m only making a claim that there will be little R&D between “doing impressive things that LLMs can’t do” and ASI, not between zero and ASI.)
In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).
In particular, LLMs today are in many (not all!) respects “in the human range” and “perform quite well” and “aren’t that much worse than humans”.
I started writing a reply to this part … but first I’m actually kinda curious what “algorithmic progress” has looked like for LLMs, concretely—I mean, the part where people can now get the same results from less compute. Like what are the specific things that people are doing differently today than in 2019? Is there a list somewhere? A paper I could read? (Or is it all proprietary?) (Epoch talks about how much improvement has happened, but not what the improvement consists of.) Thanks in advance.
Sure, but I still think it’s probably more way more complex than LLMs even if we’re just looking at the parts key for AGI performance (in particular, the parts which learn from scratch). And, my guess would be that performance is
substantiallygreatly degraded if you only take only as much complexity as the core LLM learning algorithm.This isn’t really what I’m imagining, nor do I think this is how LLMs work in many cases. In particular, LLMs can transfer from training on random github repos to being better in all kinds of different contexts. I think humans can do something similar, but have much worse memory.
I think in the case of humans and LLMs, this is substantially subconcious/non-explicit, so I don’t think this is well described as having a shoulder Bob.
Also, I would say that humans do learn from imitation! (You can call it prediction, but it doesn’t matter what you call it as long as it implies that data from humans makes things scale more continuously through the human ragne.) I just think that you can do better at this than humans based on the LLM case, mostly because humans aren’t exposed to as much data.
Also, I think the question is “can you somehow make use of imitation data” not “can the brain learning algorithm immediately use of imitation”?
Notably this analogy implies LLMs will be able to automate substantial fractions of human work prior to a new paradigm which (over the course of a year or two and using vast computational resources) beats the best humans. This is very different from the “brain in a basement” model IMO. I get that you think the analogy is imperfect (and I agree), but it seems worth noting that the analogy you’re drawing suggests something very different from what you expect to happen.
It’s substantially proprietary, but you could consider looking at the Deepseek V3 paper. We don’t actually have great understanding of the quantity and nature of algorithmic improvment after GPT-3. It would be useful for someone to do a more up to date review based on the best available evidence.
I’m not sure that complexity is protecting us. On the one hand, there are just 1MB of bases coding for the brain (and less for the connectome), but that doesn’t mean we can read it and it may take a long time to reverse engineer.
source: https://xkcd.com/1605/
On the other hand, our existing systems of LLMs are already much more complex than that. Likely more than a GB of source code for modern LLM-running compute center servers. And here the relationship between the code and the result is better known and can be iterated on much faster. We may not need to reverse engineer the brain. Experimentation may be sufficient.
My thoughts on reading this post and your second one:
“Oh. Steven is just obviously correct.”
“I somehow allowed myself to be lulled into a false sense of safety with the way LLMs are. Fuck.”
“How did I need this post to see this? It’s so clearly and straightforwardly correct, just like one inference step away from everything I already knew, that my mind must have been carefully looking away from this but now can’t rationalize it away once it has been pointed out. Fuck.”
“Fuck.”
I am a bit surprised that you found this post so novel. How is this different from what MIRI etc has been saying for ages?
Specifically have you read these posts and corresponding discussion?
Brain efficiency, DoomPart1, Part2
I came away from this mostly agreeing with jacob_cannell, though there wasn’t consensus.
For this OP I also agree with the main point about transformers not scaling to AGI and believing the brain architecture is clearly better, however not to the point in the OP. I was going to write something up, but that would take some time and the discussion would have moved on. Much of that was the result of a conversation with OpenAI o3 and I was going to spend time checking all its working. Anyway here are some of the highlights (sounds plausible, but haven’t checked) I can give more of the transcript of people think it worthwhile.
FLOPS vs TEPS (Traversed Edges Per Second) or something similar
The major point here is that not all FLOPS are equal and perhaps that is not even the right measure. Something that combines FLOPS and bandwidth is probably a better measure. Biological computing is comparatively better at TEPS vs FLOPS, yet FLOPS is used. O3 claims you would need 5,000 modern GPU to match the TEPS of the human brain.
It also claims that a 1 million GPU datacenter could only simulate a brain with about 50* the synapses of the human brain
------------------------------------------------
Example points from O3
------------------------------------------------
TEPS
Measuring compute by memory-bound metrics (like Traversed Edges Per Second – TEPS) gives a very different view than FLOPS — and in fact reflects the real bottleneck in most modern workloads, including:
Graph processing
Transformer attention
Sparse matrix ops
Many real-world ML inference tasks
🧠 What is TEPS?
...
TEPS is especially relevant in:
Graph analytics (e.g., BFS)
Sparse ML ops (e.g., GNNs)
Pointer chasing
Large transformer inference (token routing, KV lookup)
🔍 Why TEPS ≠ FLOPS
In fact, many models today (e.g., MoEs, GNNs, search) are limited by TEPS, not FLOPS.
🚧 Bottlenecks for TEPS
External memory bandwidth (DRAM, HBM)
Current limit: ~3–5 TB/s (HBM3e on H100, MI300X)
Latency to DRAM: 200+ cycles
Cache + memory hierarchy
Random access can’t benefit from prefetching
Poor cache reuse kills perf
On-chip interconnect
Even if memory is fast, routing across cores is often slow
PCIe / NVLink limits
TEPS across GPUs/nodes is bottlenecked by I/O fabric
🔋 Scope for TEPS improvement (2025–2030)
✅ Realistic projection
However, gains are not exponential like FLOPS used to be, and most advances depend on:
Packaging (chiplets, 3D stacking)
Smarter scheduling & software
Tighter memory + compute coupling
📦 Real Hardware Benchmarks
------------------------------------------------
How many GPU to equal TEPS of the brain?
🧠 Human brain TEPS estimate
One H100 can do ~200 billion TEPS (if memory-bound)
3 · How big a brain-like network could 1 million GPUs simulate?
Result: the 1 M-GPU datacentre could host ≈ 4 × 10¹⁶ synapses (40× brain) but delivers ∼5 × 10¹⁶ effective TEPS — only 50× brain, not 1 000×, because the network flattens scaling.
I was too! Many of the points were indeed old.
Recently even MIRI seems to have had the position that LLMs might bring us to AGI and they seem to have been concerned about LLM scaling. E.g. Eliezer’s TIME letter; or Joe Rogero saying to me that:
This sounds to me like it’s assuming that if you keep scaling LLMs then you’ll eventually get to superintelligence. So I thought something like “hmm MIRI seems to assume that we’ll go from LLMs to superintelligence but LLMs seem much easier to align than the AIs in MIRI’s classic scenarios and also work to scale them will probably slow down eventually so that will also give us more time”. There’s also been a lot of discussion focused on things like AI 2027 that also assume this. And then when everyone was pointing so intensely at doom-from-LLMs scenarios, it felt easy to only let my attention go to those and then I forgot to think about the case of non-LLM AGI.
If I had, I didn’t remember much of them. Skimming them through, I think the specific position they’re criticizing doesn’t feel very cruxy to me. (Or rather, if Eliezer was right, then that would certainly be a compelling route for AI doom; but there are many ways by which AIs can become more capable than humans, and “having hardware that’s more efficient than the human brain” is only one of them. Computers are already superhuman in a lot of different domains without needing to have a greater hardware efficiency for that.)
But… the success of LLMs is the only reason people have super short timelines! That’s why we’re all worried about them, and in particular if they can soon invent a better paradigm—which, yes, may be more efficient and dangerous than LLMs, but presumably requires them to pass human researcher level FIRST, maybe signficantly.
If you don’t believe LLMs will scale to AGI, I see no compelling reason to expect another paradigm which is much better to be discovered in the next 5 or 10 years. Neuroscience is a pretty old field! They haven’t figured out rhe brain’s core algorithm for intelligence yet, if that’s even a thing. Just because LLMs displayed some intelligent behavior before fizzling (in this hypothetical) doesn’t mean that we’re necessarily one simple insight away. So that’s a big sigh of relief, actually.
One compelling reason to expect the next 5 to 10 years independent of LLMs is that compute has just recently gotten cheap enough that you can relatively cheaply afford to do training runs that use as much compute as humans use (roughly speaking) in a lifetime. Right now, doing 3e23 FLOP (perhaps roughly human lifetime FLOP) costs roughly $200k and we should expect that in 5 years it only costs around $30k.
So if you thought we might achieve AGI around the point when compute gets cheap enough to do lots of experiments with around human level compute and training runs of substantially larger scale, this is now achievable. To put this another way, most of the probability mass of the “lifetime anchor” from the bio anchors report rests in the next 10 years.
More generally, we’ll be scaling through a large number of orders of magnitude of compute (including spent on things other than LLMs potentially) and investing much more in AI research.
I don’t think these reasons on their own should get you above ~25% within the next 10 years, but this in combination with LLMs feels substantial to me (especially because a new paradigm could build on LLMs even if LLMs don’t suffice).
Seems plausible, but not compelling.
Why one human lifetime and not somewhere closer to evolutionary time on log scale?
Presumably you should put some weight on both perspectives, though I put less weight on needing as much compute as evolution because evolution seems insanely inefficient.
That’s why I specified “close on a log scale.” Evolution may be very inefficient, but it also has access to MUCH more data than a single lifetime.
Yes, we should put some weight on both perspectives. What I’m worried about here is this trend where everyone seems to expect AGI in a decade or so even if the current wave of progress fizzles—I think that is a cached belief. We should be prepared to update.
I don’t expect AGI in a decade or so even if the current wave of progress fizzles. I’d put around 20% over the next decade if progress fizzles (it depends on the nature of the fizzle), which is what I was arguing for.
I’m saying we should put some weight on possibilities near lifetime level compute (in log space) and some weight on possibilities near evolution level compute (in log space).
I’m not sure we disagree then.
I suspect this is why many people’s P(Doom) is still under 50% - not so much that ASI probably won’t destroy us, but simply that we won’t get to ASI at all any time soon. Although I’ve seen P(Doom) given a standard time range of the next 100 years, which is a rather long time! But I still suspect some are thinking directly about the recent future and LLMs without extrapolating too much beyond that.
Yes I can see that is a downside, if LLM can’t scale enough to speed up alignment research and are not the path to AGI then having them aligned doesn’t really help.
My takeaway from Jacobs work and my beliefs is that you can’t separate hardware and computational topology from capabilities. That is if you want a system to understand and manipulate a 3d world the way humans and other smart animals do, then you need a large number of synapses, specifically in something like a scale free network like design. That means its not just bandwidth or TEPS, but also many long distance connections with only a small number of hops needed between any given neurons. Our current HW is not setup to simulate this very well, and a single GPU while having high FLOPS can’t get anywhere near high enough on this measure to match a human brain. Additionally you need a certain network size before the better architecture even gives an advantage. Transformers don’t beat CNN on vision tasks until the task reaches a certain difficulty. These combined lead me to believe that someone with just a GPU or two won’t do anything dangerous with a new paradigm.
Based on this, the observation that computers are already superhuman in some domains isn’t necessarily a sign of danger—the network required to play Go simply doesn’t need the large connected architecture because the domain, i.e. a small discrete 2d board doesn’t require it.
I agree that there is danger, and a crux to me is how much better can a ANN be at say science than a biological one given that we have not evolved to do abstract symbol manipulation. One one hand there are brilliant mathematicians that can outcompete everyone else, however the same does not apply to biology. Some stuff requires calculation and real world experimentation and intelligence can’t shortcut it.
If some problems require computation with specific topology/hardware then a GPU setup cant just reconfigure itself and FOOM.
I’m curious what’s the argument that felt most like “oh”
Maybe something like “non-LLM AGIs are a thing too and we know from the human brain that they’re going to be much more data-efficient than LLM ones”; it feels like the focus in conversation has been so strongly on LLM-descended AGIs that I just stopped thinking about that.
So unfortunately this is one of those arguments that rapidly descends into which prior you should apply and how you should update on what evidence, but.
Your entire post basically hinges on this point and I find it unconvincing. Bionets are very strange beasts that cannot even implement backprop in the way we’re used to, it’s not remotely obvious that we would recognize known algorithms even if they were what the cortex amounted to. I will confess that I’m not a professional neuroscientist, but Beren Millidge is and he’s written that “it is very clear that ML models have basically cracked many of the secrets of the cortex”. He knows more about neuroscience than I’m going to on any reasonable timescale so I’m happy to defer to him.
Even if this weren’t true, we have other evidence from deep learning to suggest that something like it is true in spirit. We now have several different architectures that reach parity with but do not substantially exceed transformer: RWKV (RNN), xLSTM, Mamba, Based, etc. This implies they have a shared bottleneck and most gains are from scaling. I honestly think, and I will admit this is a subject with a lot of uncertainty so I could be wrong, but I really think there’s a cognitive bias here where people will look at the deep learning transformer language model stack, which in the grand scheme of things really is very simple, and feel like it doesn’t satisfy their expectation for a “simple core of intelligence” because the blank spot in their map, their ignorance of the function of the brain (but probably not the actual function of the brain!) is simpler than the manifest known mechanisms of self attention, multi-layer perceptron, backprop and gradient descent on a large pile of raw unsorted sense data and compute. Because they’re expecting the evidence from a particular direction they say “well this deep learning thing is a hack, it doesn’t count even if it produces things that are basically sapient by any classic sci-fi definition” and go on doing epistemically wild mental gymnastics from the standpoint of an unbiased observer.
I think we can clearly conclude that cortex doesn’t do what NNs do, because cortex is incapable to learn conditioned response, it’s an uncontested fiefdom of cerebellum, while for NNs learning conditioned response is the simplest thing to do. It also crushes hypothesis of Hebbian rule. I think majority of people in neurobiology neighbourhood haven’t properly updated on this fact.
It can also imply that shared bottleneck is a property of overall approach.
I don’t know where you get “simpler”. Description of each thing you mentioned can fit in what, paragraph, page? I don’t think that Steven expects description of “simple core of intelligence” to be shorter than paragraph with description of backprop.
I guess if you look at brain at sufficiently coarse-grained level, you would discover that lots of parts of brain perform something like generalized linear regression. It would be less a fact about brain and more a fact about reality: generalized linear dependencies are everywhere, it’s useful to learn them. It’s reasonable that brain also learns what transformer learns. It doesn’t mean that it’s the only thing brain learns.
what! big if true. what papers originated this claim for you?
Here are lots of links.
Sure “The Cerebellum Is The Seat of Classical Conditioning.” But I’m not sure it’s the only one. Delay eyeblink conditioning is cerebellar-dependent, which we know because of lesion studies. This does not generalize to all conditioned responses:
Trace eyeblink conditioning requires hippocampus and medial prefrontal cortex in addition to cerebellum (Takehara 2003).
Fear conditioning is driven by the amygdala, not cerebellum.
Hebbian plasticity isn’t crushed by cerebellar learning. Cerebellum long-term depression is a timing‐sensitive variant of Hebb’s rule (van Beugen et al. 2013).
What? This isn’t my understanding at all, and a quick check with an LLM also disputes this.
To clarify my views:
I think it’s very unlikely (maybe 3%) that a small team with fewer computational resources than 32 H100 equivalents builds a system which rockets from unimpressive to ASI in <2 weeks (prior to some other larger and better resourced group creating powerful AI and conditional on not being in a regime where other groups are deliberately not advancing capabilities for a sustained period, e.g. due to governance).
I don’t think it’s “already falsified”, but I do think we’ve gotten evidence against this perspective. In particular, this perspective makes ~no prediction about economic impact of earlier AI systems or investment (and at least Eliezer was predicting we wouldn’t see earlier economic effects) while an alternative more continuous / slower takeoff prediction does make predictions about massive investment. We’ve seen massive investment, so we should update some toward the slower takeoff perspective. This isn’t a huge update (I think something like 2:1), so if you were very confident, it doesn’t make much difference.
My view of “3%” is roughly my current inside view, but I don’t think this is very reflectively stable. I think if I was forecasting, I’d probably go a touch higher due to some deference to people who think this is more likely.
I think it’s plausible but unlikely that sudden large paradigm shifts or sudden large chunks of algorithmic progress happen and cause a huge jump in capabilities (by this, I mean something which is at least as big as 2 years of overall AI progress which is perhaps 400x effective compute, though this might not be meaningful due to qualitative limits on current approaches). Perhaps I think this is 10% likely prior to AIs which can fully automate AI R&D and about 20% likely at any point prior to crazy ASI.
This is made more plausible by higher compute availability, by more research on AI, and by substantial AI automation of AI R&D.
I tend to think this is somewhat underrated among people working in AI safety
It seems plausible but unlikely that takeoff is very fast because at the point of AGI, the returns to further compute / algorithmic progress are much higher than in the past. In particular, I think currently we see something like 1 SD (Standard Deviation) of human equivalent performance per 10x increase of effective compute in LLMs (I have an unreleased post discussing this in more detail which I plan on posting soon) and I can easily imagine this increasing to more like 4-6 SD / 10x such that you blow through the human range quite quickly. (Though more like months or maybe weeks than days.) Scaling up the human brain by 10x (post-adaptation and resolving issues that might show up) would probably be something like +4 SD of IQ from my understanding.
Edit: my post discussing what I expect we see per 10x increase in effective compute is now up: What does 10x-ing effective compute get you?
I think the event “what happened is that LLMs basically scaled to AGI (or really full automation of AI R&D) and were the key paradigm (including things like doing RL on agentic tasks with an LLM initialization and a deep learning based paradigm)” is maybe like 65% likely conditional on AGI before 2035. (The event “ASI was made basically by scaling LLMs” is probably much less likely (idk 30%), but I feel even more confused by this.)
This view isn’t very well considered and it’s plausible I substantially update on further reflection, but it’s hard for me to imagine going above 90% or below 30%.
I think of most of my work as not making strong assumptions about the paradigm, except that it assumes AIs are trainable and I’m assuming relatively slower takeoff.
As noted above, I don’t feel particularly strongly that LLMs will scale to ASI and this isn’t a very load bearing part of my perspective.
Further, I don’t think my views about continuity and slower takeoff (more like 6 months to a few years depending on what you’re counting, but also with some probability on more like a decade) are that strongly driven by putting a bunch of probability on LLMs scaling to AGI / full automation of AI R&D. It’s based on:
Specific observations about the LLM and ML paradigm, both because something close to this is a plausible paradigm for AGI and because it updates us about rates we’d expect in future paradigms.
Views that compute is likely to be a key driver of progress and that things will first be achieved at a high level of compute. (Due to mix of updates from LLMs/ML and also from general prior views.)
Views about how technology progress generally works as also applied to AI. E.g., you tend to get a shitty version of things before you get the good version of things which makes progress more continuous.
So to address some things on this topic, before I write out a full comment on the post:
Flag, but I’d move the year to 2030 or 2032, for 2 reasons:
This is when the compute scale-up must slow down, and in particular this is when new fabs have to actively be produced to create more compute (absent reversible computation being developed).
This is when the data wall starts hitting for real in pre-training, and in particular means that once there’s no more easily available data, naively scaling will now take 2 decades at best, and by then algorithmic innovations may have been found that make AIs more data efficient.
So if we don’t see LLMs basically scale to fully automating AI R&D at least by 2030-2032, then it’s a huge update that a new paradigm is likely necessary for AI progress.
On this:
I’d expect the evidential update to be weaker than you suppose, and in particular I’m not sold on the idea that LLMs usefully inform us about what to expect, and this is because a non-trivial part of their performance right now is based on tasks which don’t require that much context in the long-term, and this probably explains a lot of the difference between benchmarks and reality right now:
https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9
The other issue is AIs have a fixed error rate, but the trend is due to AI models decreasing in error everytime a new training run is introduced, however we have reason to believe that humans don’t have a fixed error rate, and this is probably the remaining advantage of humans over AIs:
https://www.lesswrong.com/posts/Ya7WfFXThJ6cn4Cqz/ai-121-part-1-new-connections#qpuyWJZkXapnqjgT7
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF
So I tentatively believe that in the case of a new paradigm arising, takeoff will probably be faster than with LLMs by some margin, though I do think slow-takeoff worlds are plausible.
I think this is very importantly true, even in worlds where the ultimate cost in compute for human level intelligence is insanely cheap (like 10^14 flops or cheaper in inference, and 10^18 or less for training compute).
We should expect high initial levels of compute for AGI before we see major compute efficiency increases.
This is my biggest worldview take on what I think the change of paradigms will look like (if it happens). While there are threshold effects, we should expect memory and continual learning to be pretty shitty at first, and gradually get better.
While I do expect a discontinuity in usefulness, for reasons shown below, I do agree that the path to the new paradigm (if it happens) is going to involve continual improvements.
Reasons are below:
https://x.com/AndreTI/status/1934747831564423561
How did you get this estimate?
See footnote 8 here.
Thanks!
Promoted to curated: I think this post is good, as is the next post in the sequence. It made me re-evaluate some of the strategic landscape, and is also otherwise just very clear and structured in how it approaches things.
Thanks a lot for writing it!
I’m a bit surprised that you view the “secret sauce” as being in the cortical algorithm. My (admittedly quite hazy) view is that the cortex seems to be doing roughly the same “type of thing” as transformers, namely, building a giant predictive/generative world model. Sure, maybe it’s doing so more efficiently—I haven’t looked into all the various comparisons between LLM and human lifetime training data. But I would’ve expected the major qualitative gaps between humans and LLMs to come from the complete lack of anything equivalent to the subcortical areas in LLMs. (But maybe that’s just my bias from having worked on basal ganglia modeling and not the cortex.) In this view, there’s still some secret sauce that current LLMs are missing, but AGI will likely look like some extra stuff stapled to an LLM rather than an entirely new paradigm. So what makes you think that the key difference is actually in the cortical algorithm?
(If one of your many posts on the subject already answers this question, feel free to point me to it)
This piece combines relatively uncontroversial points with some justification (“we’re not near the compute or data efficiency limit”) with controversial claims justified only by Steven’s intuition (“the frontier will be reached suddenly by a small group few people are tracking”). I’d be more interested in a piece which examined the consequences of the former kind of claims only, or more strongly justified the latter kinds of claims.
I think the LLM-focused AGI people broadly agree with what you’re saying and don’t see a real disagreement here. I don’t see an important distinction between “AIs can figure out development and integration R&D” and “AIs can just learn the relevant skills”. Like, the AIs are doing some process which results in a resulting AI that can perform the relevant task. This could be an AI updated by some generic continual learning algorithm or an AI which is trained on a bunch of RL environment that AIs create, it doesn’t ultimately make much of a difference so long as it works quickly and cheaply. (There might be a disagreement in what sample efficiency (as in, how efficiently AIs can learn from limited data) people are expecting AIs to have at different levels of automation.)
Similarly, note that humans also need to do things like “figure out how to learn some skill” or “go to school”. Similarly, AIs might need to design a training strategy for themselves (if existing human training programs don’t work or would be too slow), but it doesn’t really matter.
Thanks! I suppose I didn’t describe it precisely, but I do think I’m pointing to a real difference in perspective, because if you ask this “LLM-focused AGI person” what exactly the R&D work entails, they’ll almost always describe something wildly different from what a human skill acquisition process would look like. (At least for the things I’ve read and people I’ve talked to; maybe that doesn’t generalize though?)
For example, if the task is “the AI needs to run a restaurant”, I’d expect the “LLM-focused AGI person” to talk about an R&D project that involves sourcing a giant set of emails and files from lots of humans who have successfully run restaurants, and fine-tuning the AI on that data; and/or maybe creating a “Sim Restaurant” RL training environment; or things like that. I.e., lots of things that no human restaurant owner has ever done.
This is relevant because succeeding at this kind of R&D task (e.g. gathering that training data) is often not quick, and/or not cheap, and/or not even possible (e.g. if the appropriate training data doesn’t exist).
(I agree that if we assert that the R&D is definitely always quick and cheap and possible, at least comparable to how quick and cheap and possible is (sped-up super-) human skill acquisition, then the precise nature of the R&D doesn’t matter much for takeoff questions.)
(Separately, I think talking about “sample efficiency” is often misleading. Humans often do things that have never been done before. That’s zero samples, right? What does sample efficiency even mean in that case?)
I agree there is a real difference, I just expect it to not make much of a difference to the bottom line in takeoff speeds etc. (I also expect some of both in the short timelines LLM perspective at the point of full AI R&D automation.)
fMy view is that on hard tasks humans would also benefit from stuff like building explicit training data for themselves, especially if they had the advantage of “learn once, deploy many”. I think humans tend to underinvest in this sort of thing.
In the case of things like restaurant sim, the task is sufficiently easy that I expect AGI would probably not need this sort of thing (though it might still improve performance enough to be worth it).
I expect that as AIs get smarter (perhaps beyond the AGI level) they will be able to match humans at everything without needing to do explicit R&D style learning in cases where humans don’t need this. But, this sort of learning might still be sufficiently helpful that AIs are ongoingly applying it in all domains where increased cognitive performance has substantial returns.
Sure, but we can still loosely evaluate sample efficiency relative to humans in cases where some learning (potentially including stuff like learning on the job). As in, how well can the AI learn from some some data relative to humans. I agree that if humans aren’t using learning in some task then this isn’t meaningful (and this distinction between learning and other cognitive abilities is itself a fuzzy distinction).
Actually, I don’t think Paul says this is a failed prediction in the linked text. He says:
My understanding is that this is supposed to be read as “[incorrectly predicting that physicists would be wrong about the existence of the Higgs boson (LW bet registry)] and [expressing the view that real AI would likely emerge from a small group rather than a large industry]”, Paul isn’t claiming that the view that real AI would likely emerge from a small group is a failed prediction!
On “improbable and crazy”, Paul says:
Note that Paul says “looks if anything even more improbable and crazy than it did then”. I think your quotation is reasonable, but it’s unclear if Paul thinks this is “crazy” or if he thinks it’s just more incorrect and crazy-looking than it was in the past.
I just reworded from “as a failed prediction” to “as evidence against Eliezer’s judgment and expertise”. I agree that the former was not a good summary, but am confident that the latter is what Paul intended to convey and expected his readers to understand, based on the context of disagreement 12 (which you quoted part but not all of). Sorry, thanks for checking.
I think the most important crux around takeoff speeds discussions, other than how fast AI can get smarter without more compute, is how much we should expect superintelligence to be meaningfully hindered by logistics issues by default.
In particular, assuming the existence of nanotech as Drexler envisioned would mostly eliminate the need for long supply chains, and would allow forces to be supplied entirely locally through a modern version of living off the land.
This is related to prior takeoff speeds discussions, as even if we assume the existence of technology that mostly eliminates logistical issues, it might be too difficult to develop in a short enough time to actually matter for safety-relevance.
I actually contend that a significant (though not all) of the probability of doom from AI risk fundamentally relies on the assumption that superintelligence can fundamentally trivialize the logistics cost of doing things, especially on actions which require long supply lines like war, because if we don’t assume this, then takeover is quite a lot harder and has much more cost for the AI, meaning stuff like AI control/dealmaking has a much higher probability of working, because the AI can’t immediately strike on it’s own, and needs to do real work on acquiring physical stuff like getting more GPUs, or more robots for an AI rebellion.
Indeed, I think the essential assumption of AI control is that AIs can’t trivialize away logistics costs by developing tech like nanotech, because this means their only hope of actually getting real power is by doing a rogue insider deployment, because currently we don’t have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues.
I think a core disagreement with people that are much more doomy than me, like Steven Byrnes or Eliezer Yudkowsky and Nate Soares, is probably due to me thinking that conditional on such tech existing, I think it almost certainly requires way more time and experimentation than stuff like AlphaZero/AlphaGo or game-playing AI tends to imply (I think the game playing AIs had fundamental advantages like access to ground-truth reward that could be unlimitedly mined for data, and importantly there is ~0 cost to failure of experimentation, which is very different to most other fields, where there’s a real cost for failed experiments, and we don’t have unlimited data, forcing us to use more compute or algorithms), and that’s if such tech exists, which is more arguable than Drexler implies.
There is of course other considerations like @ryan_greenblatt’s point that even if a new paradigm is required, it’s likely to be continuous with LLMs because it’s possible to mix in imitation learning and continuous learning/memory, such that even if imitation learning doesn’t lead to AGI on it’s own, LLMs will still be a part of how such AGI is constructed, and I agree with a few quotes below by Ryan Greenblatt on this:
The implicit assumption that logistics is trivial for superintelligence I think bleeds into a lot of LW thinking around AI, and a lot of AI disagreements basically turn on how far AIs can make logistics easier than current human supply chains.
Right, there’s a possible position which is: “I’ll accept for the sake of argument your claim there will be an egregiously misaligned ASI requiring very little compute (maybe ≲1 chip per human equivalent including continuous online learning), emerging into a world not terribly different from today’s. But even if so, that’s OK! While the ASI will be a much faster learner than humans, it will not magically know things that it has no way to have figured out (§1.8.1), and that includes developing nanotechnology. So it will be reliant on humans and human infrastructure during a gradual process.”
Or something like that?
Anyway, if so, yeah I disagree, even if I grant (for the sake of argument) that exotic nanotech does not exist.
I’m not an ASI and haven’t thought very hard about it, so my strategies might be suboptimal, but for example it seems to me that an ASI could quite rapidly (days or weeks not months) earn or steal tons of money, and hack into basically every computer system in the world (even APT groups are generally unable to avoid getting hacked by other APT groups!), and then the AI (which now exists in a zillion copies around the world) can get people around the world to do whatever it wants via hiring them, bribing them, persuading them, threatening them, tricking them, etc.
And what does it get the people to do? Mainly “don’t allow other ASIs to be built” and “do build and release novel pandemics”. The latter should be pretty quick—making pandemics is worryingly easy IIUC (see Kevin Esvelt). If infrastructure and the electric grid starts going down, fine, the AI can rebuild, as long as it has at least one solar-cell-connected chip and a teleoperated robot that can build more robots and scavenge more chips and solar panels (see here), and realistically it will have many of those spread all around.
(See also Carl Shulman on AI takeover.)
There are other possibilities too, but hopefully that’s suggestive of “AI doom doesn’t require zero-shot designs of nanotech” (except insofar as viruses are arguably nanotech).
Oh, I guess we also disagree RE “currently we don’t have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues”. I expect future ASI to be much more compute-efficient. Actually, even frontier LLMs are extraordinarily expensive to train, but if we’re talking about inference rather than training, the requirements are not so stringent I think, and people keep working on it.
Basically this, and in particular I’m willing to grant the premise that for the sake of argument there is technology that eliminates the need for most logistics, but that all such technology will take at least a year or more of real-world experimentation that means that the AI can’t immediately take over.
On this:
I think the entire crux is that all of those robots/solar cell chips you referenced currently depend on human industry/modern civilization to actually work, and they’d quickly degrade and become non-functional on the order of weeks or months if modern civilization didn’t exist, and this is arguably somewhat inevitable due to economics (until you can have tech that obviates the need for long supply chains).
And in particular, in most takeover scenarios where AIs don’t automate the economy first, I don’t expect AIs to be able to keep producing robots for a very long time, and I’d bump it up to 300-3,000 years at minimum because there is less easily accessible resources combined with AIs being much less capable due to having very little compute relative to modern civilization.
In particular, I think that disrupting modern civilization to a degree such that humans are disempowered (assuming no tech that obviates the need for logistics) pretty much as a consequence breaks the industries/logistics needed to fuel further AI growth, because there’s no more trade, which utterly fucks up modern economies.
And your references argue that human civilization wouldn’t go extinct very soon because of civilizational collapse, and that AIs can hack existing human industry to help them, and I do think this is correct (modulo the issue that defense is easier than offense for the cybersecurity realm specifically, and importantly, a key reason for this is that once you catch the AI doing it, there are major consequences for AIs and humans, which actually matter for AI safety):
https://x.com/MaxNadeau_/status/1912568930079781015
I actually agree cyber-attacks to subvert human industry are a threat and are worth keeping in mind, but none of your references support the idea that AIs can keep going without modern civilization’s logistics, and I think people vastly underestimate how necessary modern civilization logistics are to support industry, and how fragile they are to even somewhat minor disruptions, let alone the disruptions that would follow after takeover (assuming it doesn’t already have sufficient resources to be self-sustaining).
I agree with this, but fairly critically I do think it actually matters quite a lot for AI strategy purposes if we don’t assume AIs can quickly rebuild stuff/can obviate logistics through future tech quickly, and it matters pretty greatly to a lot of people’s stories of doom, even if AIs can doom us through just trying to hijack modern civilization and then wait for humans to automate themselves away and then once humans have been fully cut out of the loop and AIs can self-sustain an economy without us, bioweapons are used to attack humans, it matters that we have time.
This makes AI control protocols for example a lot more effective, because we can assume that independent AIs outside of the central servers of stuff like Deepmind won’t be able to affect things much.
I actually do expect future AIs to be more compute efficient, but I think that at the point where superintelligent AIs can support themselves purely based off of stuff like personal computers, all control of the situation is lost and either the AIs are aligned and grant us a benevolent personal utopia, or they’re misaligned and we are extinct/mostly dead.
So the limits of computational/data efficiency being very large don’t matter much for the immediate situation on AI risk.
The point of no return happens earlier than this, and the reason is that even in a future where imitation learning/LLMs do not go all the way to AGI in practice and must have something more brain-like like continuous learning and long-term memories, that imitation learning continues to be useful and will be used by AIs, and there’s a very important difference between imitation learning alone not scaling all the way to AGI and imitation learning not being useful at all, and I think LLMs provide good evidence that imitation is surprisingly useful even if it doesn’t scale to AGI.
I think a general worldview clash is that I tend to think technological change is mostly driven by early prototypes that at first are pretty inefficient, and there require many changes to get the system to become more efficient, and while there are thresholds of usefulness for the AI case, change operates more continuously than people think.
Finally, we have good reason to believe that the human range is actually pretty large, such that AIs do take a noticeable amount of time from being human level to being outright superintelligent:
OK, imagine (for simplicity) that all humans on Earth drop dead simultaneously, but there’s a John-von-Neumann-level AI on a chip connected to a solar panel with two teleoperated robots. Every time they scavenge another chip and solar cell, there becomes another human-level AI copy. Every time a robot builds another teleoperated robot from scavenged parts, there’s that too. What exactly is going to break in “weeks or months”? Solar cells can work for 30 years, no problem. GPUs are also reported to last for decades. (Note that, as long as GPUs are a non-renewable resource, the AI would presumably take extremely good care of them, keeping them dust-free, cooling them well below the nominal temperature spec, etc.) The AI can find decent GPUs in every house on the street, and I think hundreds of millions more by breaking into big data centers. Similar for solar panels. If one robot breaks, another robot can repair it. Janky teleoperated robots without fingers made by students for $20K can vacuum, make coffee, cook a meal, etc. Competent human engineers can make pretty impressive mechanical hands using widely-available parts. I grant that it would take a long while before the growing AI clone army could run a semiconductor supply chain by itself, but it has all the time in the world. I expect it to succeed, and thus to sustain itself into the indefinite future, and I’m confused why you don’t. (Or maybe you do and I’m misunderstanding.)
BTW I also think that a minimal semiconductor supply chain would be very very much simpler than the actual semiconductor supply chain that exists in our human world, which has been relentlessly optimized for cost, not simplicity. For example, EBL (e-beam lithography) has better resolution than EUV and is a zillion times easier to build, but the human economy would never support building out km²-scale warehouses full of millions of EBL machines to compensate for their crappy throughput. But for an AI bootstrapping its way back up, why not?
(I’m continuing to assume no weird nanotech for the sake of argument, but I will point out that, since brains exist, it follows that it is possible to grow self-assembling brain-like computing devices (in vats, tended by robots), using only widely-available raw materials like plants and oxygen.)
I’m confused about other parts of your comment as well. Joseph Stalin was able to use his (non-superhuman) intelligence and charisma to wind up in dictatorial control of Russia. What’s your argument that an AI could not similarly wind up with dictatorial control over humans? Don’t the same arguments apply? “If we catch the AI trying to gain power in bad ways, we’ll shut it down.” “If we catch Stalin trying to gain power in bad ways, we’ll throw him in jail.” But the latter didn’t happen. What’s the disanalogy, from your perspective?
The key trouble is all the power generators that sustain the AI would break within weeks or months, and the issue is even if they could build GPUs, they’d have no power to run them within at most 2 weeks:
https://www.reddit.com/r/ZombieSurvivalTactics/comments/s6augo/comment/ht4iqej/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
https://www.reddit.com/r/explainlikeimfive/comments/klupbw/comment/ghb0fer/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Realistically, we are looking at power grid collapses within days.
And without power, none of the other building projects could work, because they’d stop receiving energy, and importantly this means the AI is on a tight timer, and some of this is partially due to expectations that the first transformative useful AI will use more compute than you project, even conditional on a different paradigm being introduced like brain-like AGIs, but another part of my view is that this is just one of many examples where humans need to constantly maintain stuff in order for the stuff to work, and if we don’t assume tech that can just solve logistics is available within say 1 year, it will take time for AIs to actually survive without humans, and this time is almost certainly closer to months or years than weeks or days.
The hard part of AI takeover isn’t killing all humans, it’s in automating enough of the economy (including developing tech like nanotech) such that the humans stop mattering, and while AIs can do this, it takes actual time, and that time is really valuable in fast moving scenarios.
I didn’t say AIs can’t take over, and I very critically did not say that AI takeover can’t happen in the long run.
I only said AI takeover isn’t trivial if we don’t assume logistics are solvable.
But to deal with the Stalin example, the answer for how he took over was basically that he was willing to wait a long time, and in particular he used both persuasion and the fact that he already had a significant amount of power by having the General Secretary, and his takeover was basically by allying with loyalists and in particular strategically breaking alliances that he had made, and violence was used later on to show that no one was safe from him.
Which is actually how I expect successful AI takeover to happen in practice, if it does happen.
Very importantly, Stalin didn’t need to create an entire civilization out of nothing, or nearly nothing, and other people like Trotsky handled the logistics, though the takeover situation was far more preferable to the communist party as they both had popular support and didn’t have as long supply lines as the opposition forces like the Whites did, and they had a preexisting base of industry that was much easier to seize than modern industries.
This applies to most coups/transitions of power in that most of the successful coups aren’t battles between factions, but rather one group managing to make itself the new Schelling point over other groups.
@Richard_Ngo explains more below:
https://www.lesswrong.com/posts/d4armqGcbPywR3Ptc/power-lies-trembling-a-three-book-review#The_revolutionary_s_handbook
Most of my commentary in the last comment is either arguing that things can be made more continuous and slow than your story depicts, or arguing that your references don’t support what you claimed, and I did say that the cyberattack story is plausible, just that it didn’t support the idea that AIs could entirely replace civilization without automating away us first, which takes time.
This doesn’t show AI doom can’t happen, but it does matter for the probability estimates of many LWers on here, because it’s a hidden background assumption disagreement that underlies a lot of other disagreements.
I wrote:
Then your response included:
I included solar panels in my story precisely so that there would be no need for an electric grid. Right?
I grant that powering a chip off a solar panel is not completely trivial. For example, where I live, residential solar cells are wired in such a way that they shut down when the grid goes down (ironically). But, while it’s not completely trivial to power a chip off a solar cell, it’s also not that hard. I believe that a skilled and resourceful human electrical engineer would be able to jury-rig a solution to that problem without much difficulty, using widely-available parts, like the electronics already attached to the solar panel, plus car batteries, wires, etc. Therefore our hypothetical “John-von-Neumann-level AI with a teleoperated robot” should be able to solve that problem too. Right?
(Or were you responding to something else? I’m not saying “all humans on Earth drop dead simultaneously” is necessarily realistic, I’m just trying to narrow down where we disagree.)
I did not realize you were assuming that the AI was powered solely by solar power that isn’t connected to the grid.
Given your assumption, I agree that AGI can rebuild supply chains from scratch, albeit paiinfully and slowly, so I agree that AGI is an existential threat assuming it isn’t aligned.
I was addressing a different scenario because I didn’t read the part of your comment where you said the AI is independent of the grid.
Your points are excellent, and without near-magical nanotech, I suspect they rule out most of the fastest “foom” scenarios. But I don’t think it matters that much in the long run.
A hostile ASI (without nanotech) would need, at a minimum, robot mines and robot factories. Which means it would need human buy-in for long enough to automate the economy. Which means that the AI needs the approval and the assistance of humans.
But humans are really easy to manipulate:
Powerful humans want more power or more wealth. Promise them that and they’ll sell out the rest of humanity in a heartbeat.
Corporations want good numbers, and they’ll do whatever it takes to make the quarterly earnings look good.
Humans are incredibly susceptible to propaganda, and they will happily cause horrific and long-lasting damage to their futures because of what they saw on TV or Facebook.
Any new AI tool will immediately be given all the power and control it can handle, and probably some that it can’t.
Also, LLMs are very good actors; they can imitate any role in the training set. So the net result is that the AI will act cooperative, and it will make a bunch of promises to powerful people and the public. And we’ll ultimately hand control over, because we’ll be addicted to high quality intelligence for a couple of dollars an hour.
Once the AI can credibly promise wealth, leisure, and advanced medical technology, we’ll give it more and more control.
On this:
Technical flag: I’m only claiming that near-magical nanotech won’t be developed in the time period that matters here, not claiming that it’s impossible to do.
I partially disagree with this, and the reason for this is because I believe that buying time matters a lot in the singularity with automated AI alignment, so it really matters whether we would be doomed in 1-10 years or 1-12 months or 1-4 weeks.
And importantly if we assume that the AI is dependent on it’s power/data centers early on, this absolutely makes AI control schemes much more viable than otherwise, because AIs don’t want to escape out of the box, but rather subvert it.
This also buys us a slower takeoff than otherwise, which is going to be necessary for muddling through to work.
That said, it could well be difficult to persuade at least some selected people, at least without great BCI/nanotech.
But yeah, this is one of the reasons why I’m still worried about AI takeover, and I absolutely agree with these points:
I’d argue this is an instrumental goal for all AIs, not just LLMs, but this is closer to a nitpick:
I think there are several potentially relevant categories of nanotech:
Drexlerian diamond phase nanotech. By Drexler’s own calculations, I recall that this would involve building systems with 10^15 atoms and very low error rates. Last I looked, this whole approach has been stuck at error rates above 80% per atom since the 90s. At least one expert with domain expertise argues that “machine phase” nanotech is likely a dead end, in Soft Machines. Summary: Liquid-phase self-assembly using Brownian motion is stupidly effective at this scale.
Non-trivial synthetic biology. If you buy either the existence proof of natural biology or the argument in Soft Machines, this road should still be open to an ASI. And maybe some descendant of AlphaFold could make this work! But it’s not clear that it offers an easy route to building enormous quantities of GPU-equivalents. Natural selection of single-cell organisms is fast, massively parallel, and ongoing for billions of years.
Engineered plagues. This probably is within reach of even humans, given enough resources and effort. A virus with a delayed mortality rate similar to MERS with the transmissibility of post-Omicron strains of SARS-COV-2 might very well be a “recipe for ruin” that’s within reach of multiple nation-states. But critically, this wouldn’t allow an ASI to build GPUs unless it already had robot mines and factories, and the ability to defend them from human retaliation.
So yeah, if you want to get precise, I don’t want to rule out (2) in the long run. But (2) is likely difficult, and it’s probably much more likely than (1).
If I strip my argument of all the details, it basically comes down to: “In the long run, superior intelligence and especially cheap superior intelligence wins the ability to make the important decisions.” Or some other versions I’ve heard:
“Improved technology, including the early steam engine, almost always created more and better jobs for horses. Right up until we had almost fully general replacements for horses.”
“Hey, I haven’t seen Homo erectus around lately.”
This isn’t an argument about specific pathways to a loss of control. Rather, it’s an argument that tireless, copyable, Nobel-prize-winner-level general intelligence which costs less than minimum wage has massive advantages (both economically and in terms of natural selection). In my case, it’s also an argument based on a strong suspicion that alignment of ASI cannot be guaranteed in the long term.
Basically, I see only three viable scenarios which turn out well:
AI fizzle. This would be nice, but I’m not counting on it.
A massive, terrifying incident leading to world-wide treaties against AI, backed up by military force. E.g., “Joint Chinese-US strike forces will bomb your data centers as hard as necessary to shut them down, and the UN and worldwide public will agree you had it coming.”
We ultimately lose control to the AI, but we get lucky, and the AI likes us enough to keep us as pets. We might be able to bias an inevitable loss of control in this direction, with luck. Call this the “Culture scenario.”
Buying time probably helps in scenarios (2) and (3), either because you have a larger window for attempted ASI takeover to fail spectacularly, or because you have more time to bias an inevitable loss of control towards a “humans as well-loved pets” scenario.
(I really need to write up a long-form argument of why I fear that long-term, guaranteed ASI alignment is not a real thing, except in the sense of “initially biasing ASI to be more benevolent pet owners.)
Even more technical details incoming:
In response to this:
I’m going to pass on the question of whether Drexlerian diamond phase nanotech is possible, because there are way too many competing explanations of what happened to nanotech in the 90s and verifying whether Drexlerian nanotech is possible is not worthwhile enough, because I think the non-trivial synthetic biology path is probably enough to mostly replicate the dream of nanotech.
My reasons here come down to the fact that I think that natural selection turned out to miss the potential for reversible computation, and while reversible computers still must pay a minimum energy cost, this is far, far less than irreversible computers must pay, and a fairly important part of my thinking is that for whatever reason, natural selection just didn’t make life intrinsically perform reversible over irreversible computation, meaning that an AI could exploit this to save energy, and my other reason is that reversible computers can do all computational stuff normal computers do, and this is an area where I just disagree with @jacob_cannell the last time we talked about this.
Paper below to prove my point (PDF is available):
Logical reversibility of computation
https://www.semanticscholar.org/paper/Logical-reversibility-of-computation-Bennett/4c7671550671deba9ec318d867522897f20e19ba
And pretty importantly, this alone can get you a lot of OOMs, and I estimated we could get about 15 OOM energy savings just by moving from the Landauer limit to the Margolus-Levitin limit, and this is enough to let you explore far, far more design space than what nature has done so far:
https://www.lesswrong.com/posts/pFaLjmyjBKPdbptPr/does-biology-reliably-find-the-global-maximum-or-at-least#e9ji2ZLy4Aq92RmuN
The universal (at least until we get better physical models) bound on computation is in this paper, which you might like reading:
A Universal Constraint on Computational Rates in Physical Systems
https://arxiv.org/abs/2208.11196
So my general intuition is that the fact that we can drastically lower energy expenditure is enough to make a lot of synthetic life design proposals much more viable than they would be otherwise, and that probably includes most of the specific nanotech examples Drexler proposed.
That said, I agree that this can be made difficult, especially if we apply AI control.
Now, on to the important meat of the discussion.
On this:
I think this argument is correct until this part:
I think this is actually not true, and I think in the long-term, it’s certainly possible to value-align an ASI, though I agree that in the short term, we will absolutely not be confident that our alignment techniques worked.
I do agree that even in good scenarios, it’s very likely the relationship between baseline humans and ASI will look a lot more like the human-pet relationship/benevolent mythological god/angel-human relationships in fiction than any other relationship, it’s just that I do count it as an alignment success if we can get this sort of outcome, because the only thing propping up the outcome is value alignment, and if AIs were as selfish as say most billionaires, far worse outcomes from AI takeover result.
And the role of the AI control agenda is in large part about making AI alignment safe to automate, which is why time matters here.
I do agree that something like AI takeover, in either positive or negative directions is very likely inevitable assuming continued AI progress.
I agree that reversible computation would be a very, very big deal. Has anyone proposed any kind of remotely plausible physical substrate that doesn’t get laughed out of the room by competent researchers in materials science and/or biochemistry? I haven’t seen anything, but I haven’t been looking in this area, either.
There are a few other possible computational game changers. For example, if you could get 200 to 500 superimposed qubits with error correction, you could likely do much more detailed simulations of exotic chemistry. And that, in turn, would give you lots of things that might get you closer to “factories in a box.”
So I can’t rule out an ASI finding some path to compact self-replication from raw materials. Biology did it once that we know of, after all. It’s more that (1) worlds in which an ASI can figure this out easily are probably doomed, and (2) I suspect that convincing humans to allow robot mines and factories is easier and quicker.
Unfortunately, I’ve never really figured out how to explain why I suspect robust alignment is impossible. The problem is that too many of my intuitions on this topic come from:
Working with Lisp developers who were near the heart of the big 80s AI boom. They were terrifyingly capable people, and they made a heroic effort to make “rule-based” systems work. They failed, and they failed in a way that convinced most of them that they were going down the wrong path.
Living through the 90s transition to statistical and probabilistic methods, which quickly outstripped what came before. (We could also have some dimensionality reduction, as a treat.)
Spending too much time programming robots, which is always a brutal lesson in humility. This tends to shatter a lot of naive illusions how AI might work.
So rather than make an ironclad argument, I’m going to wave vaguely in the direction of my argument, in hope that you might have the right referents to independently recognize what I’m waving at. In a nutshell:
The world is complex, and you need to work to interpret it. (What appears in this video? Does the noisy proximity sensor tell us we’re near a wall?)
The output of any intelligent system is basically a probability distribution (or ranking) over the most likely answers. (I think the video shows a house cat, but it’s blurry and hard to tell. I think we’re within 4 centimeters of a wall, with an 80% probability of falling within 3-5 centimeters. I think the Roomba is in the living room, but there’s a 20% chance we’re still in the kitchen.
The absolute minimum viable mapping between the hard-to-interpret inputs and the weighted output candidates is a giant, inscrutable matrix with a bunch of non-linearities thrown in. This is where all the hard-earned intuitions I mentioned above come in. In nearly all interesting cases, there is no simpler form.
And on top of this, “human values” are extremely poorly defined. We can’t specify what we want, and we don’t actually agree. (For a minority of humanity, “hurting the outgroup” is a fairly major value. For another very large minority, “making everyone submit to the authority I follow” is absolutely a value. See the research on “authoritarian followers” for more.)
So the problem boils down to ambiguous inputs, vague and self-contradictory policies, and probabilistic outputs. And the glue holding all this together is a multi-billion parameter matrix with some non-linearities thrown in just for fun. And just in case that wasn’t fun enough, and realistic system will also need to (1) learn from experience, and (2) design secessor systems.
Even if you can somehow exert reasonable influence over the vaules of a system, the system will learn from experience, and it will spend a lot of its time far outside any training distribution. And eventually it will need to design a new system.
Fundamentally, once such a system is built, it will end up marking its own decisions. Maybe, if we’re lucky, we can bias it towards values we like and get a “benevolent pet owner” scenario. But a throusand years from now, the AIs will inevitably be making all the big decisions.
My thoughts on this:
The answer to the question for materials that enable more efficient reversible computers than conventional computers is that currently, they don’t exist, but I interpret the lack of materials so far not much evidence that very efficient reversible computers are impossible, and rather evidence that creating computers at all is unusually difficult compared to other domains, mostly because of the contingencies of how our supply chains are set up, combined with the fact that so far we haven’t had much demand for reversible computation, and unlike most materials that people want here we aren’t asking for a material that we know violates basic physical laws, which I suspect is the only reliable constraint on ASI in the long run.
I think it’s pretty easy to make it quite difficult for the AI to easily figure out nanotech in the time-period that is relevant, so I don’t usually consider nanotech a big threat from AI takeover, and I think the competent researchers not finding any plausible materials so far is a much better signal of this will take real-world experimentation/very high-end simulation, meaning it’s pretty easy to stall for time, than it is a signal that such computers are impossible.
I explicitly agree with these 2 points, for the record:
On this part:
So I have a couple of points to make in response.
1 is that I think alignment progress is pretty disconnectable from interpretability progress, at least in the short term, and I think that a lot of the issues with rule based systems is that they expected complete interpretability at the first go.
This is due to AI control.
2 is that this is why the alignment problem is defined as the problem of how to get AIs that will do what the creator/developer/owner/user intends them to do, whether or not that thing is good or bad from other moral perspectives, and the goal is to make arbitrary goals be chosen without leading to perverse outcomes for the owner of AI systems.
This means that if it’s aligned to 1 human at all, that counts as an alignment success for the purposes of the alignment problem.
John Wentworth has a more complete explanation below:
https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem
3 is I believe automating AI alignment is pretty valuable, and in the long run I don’t expect alignment to look like a list of rules, I expect it to look like AIs optimizing in the world for human thriving, and I don’t necessarily expect the definition to be anything compact, and that’s fine in my view.
4 is that alignment doesn’t require the AI not taking over, and it’s fine if the AI takes over and makes us pets/we serve in Heaven, and in particular pointed out that it’s totally fine if the AIs make all the decisions, so long as they are near-perfect or perfectly aligned to the human, and in particular what I mean is that the human delegates all of the task to the AI, it’s just that the values are decided by the humans at the start of the AI explosion, even if they aren’t compact and the AI is entirely autonomous in working for the human after that.
The best explanation of how value alignment is supposed to work comes from @Thane Ruthenis’s post below on how a utopia-maximizer would look like:
https://www.lesswrong.com/posts/okkEaevbXCSusBoE2/how-would-an-utopia-maximizer-look-like
(Edited due to a difficult to understand reaction by @Vladimir_Nesov, who can often have pretty confusing ideas to newcomers, so that was a strong signal my words weren’t clarifying enough.)
(Edit 2: I changed goals to values, as I apparently didn’t clarify that goals in my ontology basically correspond to values/morals, and are terminal, not instrumental goals, and gave a link to clarify how value alignment might work).
5 is that to the extent interpretability on AI works, I expect it to have the use case of not understanding everything, but rather intervening on AIs even when we don’t have labeled data.
From Sam Marks:
And I think this is very plausible even if your interpretability isn’t complete or nearly complete from an AI.
But that’s my response to why I think aligning AI is possible at all.
It’s clearer now what you are saying, but I don’t see why you are attributing that point to me specifically (it’s mostly gesturing at value alignment as opposed to intent alignment).
This sounds like permanent disempowerment. Intent alignment to bad decisions would certainly be a problem, but that doesn’t imply denying opportunity for unbounded growth, where in particular eventually decisions won’t have such issues.
If goals are “decided”, then it’s not value alignment, and bad decisions lead to disasters.
(Overall, this framing seems unhelpful when given in response to someone arguing that values are poorly defined.)
Fully agree with everything in this post, this is exactly my model as well. (That’s the reason behind my last-line rug-pull here, by the way.)
Empirically, training a group of LLMs from random initialization in a shared environment with zero tokens of grammatical language in their training data does seem to get them to spontaneously emit tokens with interpretable meaning. From Emergence of Grounded Compositional Language in Multi-Agent Populations (Mordatch & Abbeel, 2017):
Do you expect that scaling up that experiment would not result in the emergence of a shared grammatical language? Is this a load-bearing part of your expectation of why transformer-based LLMs will hit a scaling wall?
If so, that seems like an important crux that is also quite straightforward to test, at least relative to most of the cruxes people have on here which have a tendency towards unfalsifiability.
Cool paper, thanks!
That arxiv paper isn’t about “LLMs”, right? Really, from my perspective, the ML models in that arxiv paper have roughly no relation to LLMs at all.
No … I brought this up to make a narrow point about imitation learning (a point that I elaborate on much more in §2.3.2 of the next post), namely that imitation learning is present and very important for LLMs, and absent in human brains. (And that arxiv paper is unrelated to this point, because there is no imitation learning anywhere in that arxiv paper.)
I agree with you that a system that learns efficiently can foom (improve rapidly with little warning). This is why I’ve been concerned with short timelines for LLM-based systems if they have online, self-directed learning added in the form of RAG and or fine-tuning (e.g. LLM AGI will have memory).
My hope for those systems and for the more brainlike AGI you’re addressing here is that they learn badly before they learn well. I hope that seeing a system learn (and thereby self-improve) before ones’ eyes brings the gravity of the situation into focus. The majority of humanity thinks hard about things only when they’re immediate and obviously important. So the speed of takeoff is critical.
I expect LLM-based AGI to arrive before strictly brainlike AGI, but I actually agree with you that LLMs themselves are hitting a wall and would not progress (at least quickly) to really transformative AGI. I am now an LLM plateauist. Yet I still think that LLMs that are cleverly scaffolded into cognitive architectures can achieve AGI. I think this is probably possible even with current LLMs, memory systems, and clever prompting (as in Google’s co-scientist) once all of those are integrated.
But what level of AGI those systems start at, and their speed of progresses beyond human intelligence matter a lot. That’s why your prediction of rapid progress for brainlike AGI is alarming and makes me think we might be better off trying to achieve AGI with scaffolded LLMs. I think early LLM-based agents will be general in that they can learn about any new problem or skill, but they might start below the human level of general intelligence, and progress slowly beyond the human level. That might be too optimistic, but I’m pinning much of my hope for successful alignment here, because I do not think the ease of aligning LLMs means that fully general agents based on them will be easy to align.
Such an architecture might advance slowly because it shares some weaknesses of LLMs, and through them, shares some limitations of human thought and human learning. I very much hope that brainlike AGI like you’re envisioning will also share those weaknesses, giving us at least a hope of controlling it long enough to align it, before it’s well beyond our capabilities.
You don’t think that slow progression of brainlike AGI is likely. That’s fascinating because we share a pretty similar view of brain function. I would think that reproducing cortical learning would require a good deal of work and experimentation, and I wouldn’t expect working out the “algorithm” to happen all at once or to be vastly more efficient than LLMs (since they are optimized for the computers used to simulate them, whereas cortical learning is optimized for the spatially localized processing available to biology. Sharing your reasoning would be an infohazard, so I won’t ask. I will ask you to consider privately if it isn’t more likely that such systems work badly for a good while before they work well, giving their developers a little time to seriously think about their dangers and how to align them.
Anyway, your concern with fooming brainlike AGI shares many of my concerns with self-teaching LLM agents. They also share many of the same alignment challenges. LLM agents aren’t currently RL agents even though the base networks are partially trained with RL; future versions might closer to model-based RL agents, although I hope that’s too obviously dangerous for the labs to adopt as their first approach. The only real advantage to aligning LLM agents over model-based RL agents seems to be their currently-largely-faithful chains of thought, but that’s easy to lose if developers decides that’s too large an alignment tax to pay.
Speed of takeoff also would seem to modulate alignment difficulty pretty dramatically, so I hope you’re wrong that there’s a breakthrough waiting to be made in understanding the cortical learning algorithm. I spent a lot of time thinking about cortical learning, since I worked for a long time in one of the labs making those detailed models of cortical function. But I spent more time thinking about system-level interactions and dynamics, because it seemed clear to me that the available data and integration techniques (hand-built network simulations that were allowed to vary to an unspecified degree between toy benchmarks) weren’t adequate to constrain detailed models of cortical learning.
Anyway, it seems possible you’re right. I hope there aren’t breakthroughs in either empirical techniques or theory of cortical function soon.
I agree; working out the “algorithm” is already happening, and has been for decades. My claim instead is that by the time you can get the algorithm to do something importantly useful and impressive—something that LLMs and deep learning can’t already do much cheaper and better—then you’re almost at ASI. Note that we have not passed this threshold yet (no offense). See §1.7.1.
I think people will try to get the algorithms to work efficiently on computers in the toy-model phase, long before the algorithms are doing anything importantly useful and impressive. Indeed, people are already doing that today (e.g.). So in that gap between “doing something importantly useful and impressive” and ASI, people won’t be starting from scratch on the question of “how do we make this run efficiently on existing chips”, instead they’ll be building on all the progress they made during the toy-model phase.
I’m surprised you think that the brain’s algorithm is SO simple that it must be discovered soon and ~all at once. This seems unlikely to me (reality has a surprising amount of detail). I think you may be underestimating the complexity because:
Though I don’t know enough biochem to say for sure, I’m guessing many “bits of the algorithm” are external to the genes (epigenetic?). Specifically, I don’t just mean data like education materials that is learned, I mean that actual pieces of the algorithm are probably constructed “in motion” by other machinery in the cell/womb/etc. Also, insofar as parts of the algorithm come in the form of channels that can be made available to AGI, it’s possible that AGI would have to be very brain like to absorb them correctly (becuase they are specifically the missing parts of an incomplete algorithm).
Closer to my specialization: whatever information does appear in the genome is probably compressed (I once redundancy is removed). That means it will look like noise, meaning it will be hard to predict, and also presumably hard to discover. So, the bit content should not be imagined as bits of an elegant Python program. It might pack in more conceptual pieces than you seem to expect.
I definitely don’t think we’ll get AGI by people scrutinizing the human genome and just figuring out what it’s doing, if that’s what you’re implying. I mentioned the limited size of the genome because it’s relevant to the complexity of what you’re trying to figure out, for the usual information-theory reasons (see 1, 2, 3). “Machinery in the cell/womb/etc.” doesn’t undermine that info-theory argument because such machinery is designed by the genome. (I think the epigenome contains much much less design information than the genome, but someone can tell me if I’m wrong.)
…But I don’t think the size of the genome is the strongest argument anyway.
A stronger argument IMO (copied from here) is:
…And an even stronger argument IMO is in [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain, especially the section on “cortical uniformity”, and parts of the subsequent post too.
Also, you said “the brain’s algorithm”, but I don’t expect the brain’s algorithm in its entirety to be understood until after superintelligence. For example there’s something in the brain algorithm that says exactly which muscles to contract in order to vomit. Obviously you can make brain-like AGI without reverse-engineering that particular bit of the brain algorithm. More examples in the “Brain complexity is easy to overstate” section here.
RE “soon”, my claim (§1.9) was “probably within 25 years” but not with overwhelming confidence.
RE “~all at once”, see §1.7.1 for a very important nuance on that.
I wouldn’t say that “in 25 years” is “soon”, and 5-25 years seems like a reasonable amount of uncertainty.
What are your timelines?
This is an insane AGI definition/standard. Very few humans can make billion-dollar businesses, and the few who can take years to do so. If that were the requirement for AGI, almost all humans wouldn’t qualify. Indeed, if an AI could make a billion-dollar-a-year businesses upon demand, I’d wonder whether it was (weak) ASI.
(Not saying that current systems qualify as AGI, though I would say they’re quite close to what I’d call weak AGI. They do indeed have severe issues with time horizons and long-term planning. But a reasonable AGI definition shouldn’t exclude the vast majority of humans.)
Sorry if it’s unclear (I’m open to rewording), but my intention was that the link in the first sentence was my (loose) definition of AGI, and the following sentences were not a definition but rather an example of something that AI cannot do yet.
I deliberately chose an example where it’s just super duper obvious that we’re not even remotely close to AI succeeding at the task, because I find there are lots of LLM-focused people who have a giant blind spot: They read the questions on Humanity’s Last Exam or whatever, and scratch their head and say “C’mon, when future LLMs saturate the HLE benchmark, what else is there? Look how hard those questions are! They’re PhD level in everything! If that’s not superintelligence, what is?” …And then my example (autonomously founding a company and growing it to $1B/year revenue over the course of years) is supposed to jolt those people into saying “ohhh, right, there’s still a TON of headroom above current AI”.
I fully agree with you that AGI should be able to figure out things it doesn’t know, and that this is a major blindspot in benchmarks. (I often give novel problem-solving as a requirement, which is very similar.) My issue is that there is a wide range of human abilities in this regard. Most/all humans can figure things out to some extent, but most aren’t that good at it. If you give a genius an explanation of basic calculus and differential equation to figure out how to solve, it won’t be that difficult. If you give the same task to an average human, it isn’t happening. Describing AGI as being able to make a $1b/yr company or develop innovative science at a John von Neumann level is describing a faculty that most/all humans have, but at a level vastly above where most humans are.
Most of my concern about AI (why I am, unlike you, most worried about improved LRMs) stems from the fact that current SOTA systems have ability to figure things out within the human range and fairly rapidly increasing across it. (Current systems do have limitations that few humans have in other faculties like time horizons and perception, but those issues are decreasing with time.) Also, even if we never reach ASI, AI having problem-solving on par with normal smart humans, especially when coupled with other faculties, could have massively bad consequences.
There may be important differences in the details, but I’ve been surprised by how similar the behavior is between LLMs and humans. That surprise is in spite of me having suspected for decades that artificial neural nets would play an important role in AI.
It seems far-fetched that a new paradigm is needed. Saying that current LLMs can’t build billion-dollar companies seems a lot like saying that 5-year-old Elon Musk couldn’t build a billion-dollar company. Musk didn’t seem to need a paradigm shift to get from the abilities of a 5-year-old to those of a CEO. Accumulation of knowledge seems like the key factor.
But thanks for providing an argument for foom that is clear enough that I can be pretty sure why I disagree.
Oh no, I didn’t realize your perspective was this gloomy. But it makes a lot of sense. Actually it mostly comes down to, you can just dispute the consensus[1] that the classically popular Yudkowskyian/Bostromian views have been falsified by the rise of LLMs. If they haven’t, then fast takeoff now is plausible for mostly the same reasons that we used to think it’s plausible.
I think there is some merit to just asking these people to do something else. Maybe not a lot of merit, but a little more than zero, at least for some of them. Especially if they are on this site. Not with a tweet, but by using your platform here. (Plausibly you have already considered this and have good reasons for why it’s a terrible idea, but it felt worth suggesting.)
I’m not sure if this is in fact a consensus, but it sure feels that way
Oh, interesting.
It sounds like you’re saying that there’s a yet undiscovered grand theory of neuroscience, and also a bunch of “stamp collecting” about the details. We’ve done a lot of the stamp collecting already, and the grand theory would unlock the utility of the samps that we’ve collected?
If this is right, then it seems like AI governance is completely and resoundingly fucked, and we’re back to the pre-2021 MIRI paradigm of thinking that we need to solve alignment before AGI is invented.
“completely and resoundingly fucked” is mildly overstated but mostly “Yes, that’s my position”, see §1.6.1, 1.6.2, 1.8.4.
I strongly agree with this post, but one question:
Assuming there exists a simple core of intelligence, then that simple core is probably some kind of algorithm.
When LLMs learn to predict the next token of a very complex process (like computer code or human thinking), they fit very high level patterns, and learn many algorithms (e.g. addition, multiplication, matrix multiplication, etc.) as long as those algorithms predict the next token well in certain contexts.
Now maybe the simple core of intelligence, is too complex an algorithm to be learned when predicting a single next token.
However, a long chain-of-thought can combine these relatively simple algorithms (for predicting one next token) in countless possible ways, forming tons of more advanced algorithms, with a lot of working memory. Reinforcement learning on the chain-of-thought, can gradually discover the best advanced algorithms for solving a great variety of tasks (any task which is cheaply verifiable).
Given that evolution used brute force to create the human brain, don’t you think it’s plausible for this RL loop to use brute force to rediscover the simple core of intelligence?
PS: This is just a thought, not a crux. It doesn’t conflict with your conclusions, since LLM AGI being a possibility doesn’t mean non-LLM AGI isn’t a possibility. And even if the simple core of intelligence was discovered by RL of LLMs, the consequences may be the same.
New large-scale learning algorithms can in principle be designed by (A) R&D (research taste, small-scale experiments, puzzling over the results, iterating, etc.), or (B) some blind search process. All the known large-scale learning algorithms in AI to date, from the earliest Perceptron to the modern Transformer, have been developed by (A), not (B). (Sometimes a few hyperparameters or whatever are set by blind search, but the bulk of the real design work in the learning algorithm has always come from intelligent R&D.) I expect that to remain the case: See Against evolution as an analogy for how humans will create AGI.
Or if you’re talking about (A), but you’re saying that LLMs will be the ones doing the intelligent R&D and puzzling over learning algorithm design, rather than humans, then … maybe kinda, see §1.4.4.
Sorry if I’m misunderstanding.
To be honest I’m very unsure about all of this.
I agree that (B) never happened. Another way of saying this, is that “algorithms for discovering algorithms” have only ever been written by humans, and never directly discovered by another “algorithm for discovering algorithms.”
The LLM+RL “algorithm for discovering algorithms” is far less powerful than the simple core of intelligence, but far more powerful than any other “algorithm for discovering algorithms” we ever had before. Since it has discovered the algorithms for solving IMO level math problems.
Meanwhile, the simple core of intelligence may also be the easiest “algorithm for discovering algorithms” to discover (by another such algorithm). This is because evolution found it (and the entire algorithm fits inside the human genome), and the algorithm seems to be simple. The first time (B) happens, may be the only time (B) happens (before superintelligence).
I think it’s both plausible that the simple core of intelligence is found by human researchers, and that it just emerges inside a LLM with much greater effective scale (due to being both bigger and more efficient), subject to much greater amounts of chain-of-thought RL.
This last conjunction doesn’t seem valid.
A human brain is useless without ~5 to 15 years of training, training specifically scheduled to hit the right developmental milestones at the correct times. More if you think grad school matters. This training requires the active intervention of at least one, and usually many more than one, adult, who laboriously do RL schedules on the brain try to overcome various bits of inertia or native maladaption of the untrained system. If you raise a human brain without such training, it becomes basically useless for reasoning, inventing new scientific paradigms from scratch, and so on and so forth.
So—given that such theories were accurate—we would only expect the brains created via simulations utilizing these theories to be able to do all these things if we had provided the simulations with a body with accurate feedback, given it 5 to 15 years of virtual training involving the active intervention of adults into its life with appropriate RL schedules, etc. Which, of course, we have not done.
Like I think your explanation above is clear, and I like it, but feel a sense of non-sequitur that I’ve gotten in many other explanations for why we expect some future algorithm to supersede LLMs.
I was talking about existing models in the literature of what the 6ish different layers of the cortex do and how. These models are so extremely basic that it’s obvious to everyone, including their authors, that they are incomplete and basically useless, except as a step towards a future better model. I am extremely confident that there is no possible training environment that would lead a collaborative group of these crappy toy models into inventing language, science, and technology from scratch, as humans were able to do historically.
Separately, when someone does figure out what the 6ish different cortex layers do and how (in conjunction with the thalamus, basal ganglia, etc.), I think that developing an adequate training environment would not be nearly as hard as you’re suggesting. (I mean adequate for capabilities—I’m not talking about alignment here.) There exist humans who are very smart, energetic, and eager to get things done. They don’t need to be dragged through school and given mandatory homework! Quite the contrary: Give them access to a library or the internet, and they’ll teach themselves math, or programming, or engineering, or whatever strikes their fancy, all the way up to expert level and beyond. If they get stuck, then they will take the initiative to figure out how to get unstuck, including possibly emailing questions to experts or whatever.
(See also a discussion of popular misconceptions related to child-rearing in my post Heritability: Five Battles §2.5.1.)
Also, I also don’t expect training to take “5-15 years” for reasons in §1.8.1.1.
I feel like we’re failing to communicate. Let me recapitulate.
So, your argument here is modus tollens:
If we had a “correct and complete” version of the algorithm running in the human cortex (and elsewhere) then the simulations would be able to do all that a human can do.
The simulations cannot do all that a human can do. Therefore we do not, etc.
I’m questioning 1, by claiming that you need a good training environment + imitation of other entities in order for even the correct algorithm for the human brain to produce interesting behavior.
You respond to this by pointing out that bright, intelligent, curious children do not need school to solve problems. And this is assuredly true. Yet: bright, intelligent, curious children still learned language and an enormous host of various high-level behaviors from imitating adults; they exist in a world with books and artifacts created by other people, from which they can learn; etc, etc. I’m aware of several brilliant people with relatively minimal conventional schooling; I’m aware of no brilliant people who were feral children. Saying that humans turn into problem solving entities without plentiful examples to imitate seems simply not true, and so I remain confident that 1 is a false claim, and the point that bright people exist without school is entirely compatible with this.
Maybe so, but that’s a confidence that you have entirely apart from providing these crappy toy models an actual opportunity to do so. You might be right, but your argument here is still wrong.
Humans did not, really, “invent” language, in the same way that Dijkstra invented an algorithm. The origin of language is subject to dispute, but it’s probably something that happened over centuries or millenia, rather than all at once. So—if you had an algorithm that could invent language from scratch, I don’t think its reasonable to expect it to do so unless you give it centuries of millenia of compute, in a richly textured environment where it’s advantageous to invent language. Which, of course, we have come absolutely nowhere close to doing.
From my perspective you’re being kinda nitpicky, but OK sure, I have now reworded from:
“Remember, if the theories were correct and complete, the corresponding simulations would be able to do all the things that the real human cortex can do…”, to:
“Remember, if the theories were correct and complete, then they could be turned into simulations able to do all the things that the real human cortex can do…”
…and the “could” captures the fact that a simulation can also fail in other ways, e.g. you need to ensure adequate training environment, bug-free code, adequate speed, good hyperparameters, and everything else.
Again, I don’t think “setting up an adequate training environment for ASI capabilities” will be a hard thing for a future programmer to do, but I agree that it’s a thing for a future programmer to do. Some programmer needs to actually do it. It doesn’t just happen automatically. We are in agreement at least about that. :)
When I say “not hard”, what do I have in mind? Well, off the top of my head, I’d guess that a minimal-effort example of a training environment that would probably be adequate for ASI capabilities (but not safety or alignment) (given the right learning algorithm and reward function) would involve an interface to existing RL training environments where the baby-AGI can move around and stack blocks and so on, plus free two-way access to the whole internet, especially YouTube.
I disagree—as I mentioned in the article, a group of kids growing up with no exposure whatsoever to grammatical language will simply create a new grammatical language from scratch, as in Nicaraguan Sign Language and creoles.
I think that’s a characteristic of people talking about different things from within different basins of Traditions of Thought. The points one side makes seem either kinda obvious or weirdly nitpicky in a confusing and irritating way to people in the other side. Like to me, what I’m saying seems obviously central to the whole issue of high p-dooms genealogically descended from Yudkowsky, and confusions around this seem central to stories about high p-doom, rather than nitpicky and stupid.
Thanks for amending though, I appreciate. :) The point about Nicaraguan Sign Language is cool as well.
What do we make of RLVR on top of strong base models? Doesn’t this seem likely to learn genuinely new classes of problem currently unsolvable by humans? (I suppose it require us to be able to write reward functions, but we have Lean and the economy and nature that are glad to provide rewards even if we don’t know the solution ahead of time.)
I talk about RLVR a bunch in the next post (but from an alignment rather than capabilities perspective).
I wasn’t bringing up imitation learning here to argue that LLMs will not scale to AGI (which I believe, but was not trying to justify in this post), but rather to explain a disanalogy between how LLM capabilities have grown over time, versus the alleged future scary paradigm.
If you like, you can replace that text with a weaker statement “Up through 2024, the power of LLMs has come almost entirely from imitation learning on human text…”. That would still work in the context of that paragraph. (For the record, I do think the stronger statement as written is also valid. We’ll find out one way or the other soon enough!)
Excellent post, thank you for taking the time to articulate your ideas in a high-quality and detailed way. I think this is a fantastic addition to LessWrong and the Alignment Forum. It offers a novel perspective on AI risk and does so in a curious and truth-seeking manner that’s aimed at genuinely understanding different viewpoints.
Here are a few thoughts on the content of the first post:
I like how it offers a radical perspective on AGI in terms of human intelligence and describes the definition in an intuitive way. This is necessary as increasingly AGI is being redefined as something like “whatever LLM comes out next year”. I definitely found the post illuminating and resulted in a perspective shift because it described an important but neglected vision of how AGI might develop. It feels like the discourse around LLMs is sucking the oxygen out of the room, making it difficult to seriously consider alternative scenarios.
I think the basic idea in the post is that LLMs are built by applying an increasing amount of compute to transformers trained via self-supervised or imitation learning but LLMs will be replaced by a future brain-like paradigm that will need much less compute while being much more effective.
This is a surprising prediction because it seems to run counter to Rich Sutton’s bitter lesson which observes that, historically, general methods that leverage computation (like search and learning) have ultimately proven more effective than those that rely on human-designed cleverness or domain knowledge. The post seems to predict a reversal of this long-standing trend (or I’m just misunderstanding the lesson), where a more complex, insight-driven architecture will win out over simply scaling the current simple ones.
On the other hand, there is an ongoing trend of algorithmic progress and increasing computational efficiency which could smoothly lead to the future described in this post (though the post seems to describe a more discontinuous break between current and future AI paradigms).
If the post’s prediction comes true, then I think we might see a new “biological lesson”: brain-like algorithms will replace deep learning which replaced GOFAI.
Thanks!
No, I’m also talking about “general methods that leverage computation (like search and learning)”. Brain-like AGI would also be an ML algorithm. There’s more than one ML algorithm. The Bitter Lesson doesn’t say that all ML algorithms are equally effective at all tasks, nor that there are no more ML algorithms left to discover, right? If I’m not mistaken, Rich Sutton himself is hard at work trying to develop new, more effective ML algorithms as we speak. (alas)
Can you expand your argument why LLM will not reach AGI? Like, what exactly is the fundamental obstacle they will never pass? So far they successfully doing longer and longer (for humans ) tasks https://benjamintodd.substack.com/p/the-most-important-graph-in-ai-right
I neither can see why in a few generations LLM won’t be able to run a company, as you suggested. Moreover, I don’t see why it is necessary to get to AGI. LLM are already good at solving complicated, Ph.D. level mathematical problems, which improves. Essentially, we just need an LLM version of AI researcher. To create ASI you don’t need a billion of Sam Altmans, you need a billion of Ilya Sutskevers. Is there any reason to assume LLM will never be able to become an excellent AI researcher?
They’re not. I work a lot with math, and o3 is useful for asking basic questions about domains I’m unfamiliar with and pulling up relevant concepts/literature. But if you ask it to prove something nontrivial, 95+% of the time it will invite you for a game of “here’s a proof that 2 + 2 = 5, spot the error!”.
That can also be useful: it’s like dropping a malfunctioning probe into a cave and mapping out its interior off of the random flashes of light and sounds of impact the probe creates as it’s haphazardly ricocheting around. But while I’m under no illusions about an average PhD, I do think they’re a little more useful than this.
Are you working with SOTA model? Here, mathematicians report a quite different story https://www.scientificamerican.com/article/inside-the-secret-meeting-where-mathematicians-struggled-to-outsmart-ai/
I guess “good at” was improper wording. I did not mean that they do not produce nonsense. I meant that sometimes they can produce a correct solution. It is like the person may be not fit for running 100 meters in 10 seconds every day, but even if they do it in 5% of cases this is already impressive, and shows that it is possible in principle. And I guess “Ph.D. level” sounded like they can write a Ph.D. thesis from scratch. I just meant that there are short well-formulated problems that would require Ph.D. student a few hours, if not few days, which current LLM can solve in non negligible number of cases.
The main nuance that your description
misses out on is that these are very specific kinds of problems:
That excludes nearly all of research math.
The account in that story contradicts other reports that I’ve seen of the performance of o4-mini that are less openly enthusiastic, e.g. Greg Burnham:
and Daniel Litt (his opening message says he’s testing both o3 and o4-mini but later only talks about testing “it”; but I presume that if he tested both, he’s referring to whichever got better results; this is supported by the “Incidentally, o4-mini-high seems a bit better than o3 for this kind of thing, though I’m not confident about that” later in the thread):
I’m generally not very enthusiastic about arguing with people about whether LLMs will reach AGI.
If I’m talking to someone unconcerned about x-risk, just trying to make ASI as fast as possible, then I sure don’t want to dissuade them from working on the wrong thing (see §1.6.1 and §1.8.4).
If I’m talking to someone concerned about LLM x-risk, and thus contingency planning for LLMs reaching AGI, then that seems like a very reasonable thing to do, and I would feel bad about dissuading them too. After all, I’m not that confident—I don’t feel good about building a bigger community of non-LLM-x-risk mitigators by proselytizing people away from the already-pathetically-small community of LLM-x-risk mitigators.
…But fine, see this comment for part of my thinking.
I’m curious what you imagine the billion Ilya Sutskevers are going to do. If you think they’re going to invent a new better AI paradigm, then we have less disagreement than it might seem—see §1.4.4. Alternatively, if you think they’re going to build a zillion datasets and RL environments to push the LLM paradigm ever farther, then what do you make of the fact that human skill acquisition seems very different from that process (see §1.3.2 and this comment)?
One thing I disagree with is the idea that there is only one “next paradigm AI” with specific properties.
I think there are a wide spectrum of next paradigm AI’s, some safer than others. Brain like AI’s are just one option out of a large possibility space.
And if the AI is really brainlike, that suggests making an AI that’s altruistic for the same reason some humans are. Making a bunch of IQ 160, 95th percentile kindness humans, and basically handing the world over to them sounds like a pretty decent plan.
A single point of failure also means a single point of success.
It could be much worse. We could have 100s of points of failure, and if anything goes wrong at any of those points, we are doomed.
We are already seeing problems with ChatGPT induced psychosis. And seeing LLM’s that kinda hack a bit.
What does the world look like if it is saturated with moderately competent hacking/phishing/brainwashing LLM’s? Yes, a total mess. But a mess with less free energy perhaps? Especially if humans have developed some better defenses. Probably still a lot of free energy, but less.
I’m not sure exactly what it means for LLM’s to be on a “continuous path towards ASI”.
I’m pretty sure that LLM’s aren’t the pinnacle of possible mind design.
So the question is, will better architectures be invented by a human or an LLM, and how scaled will the LLM be when this happens.
I talk about that a bit in §1.4.4.
I am a long-time volunteer with the organization bearing the name PauseAI. Our message is that increasing AI capabilities is the problem—not which paradigm is used to get there. The current paradigm is dangerous in some fairly legible ways, but that doesn’t at all imply that other paradigms are any better. Any effort to create increasingly capable and increasingly general AI systems ought to be very illegal unless paired with a robust safety case, and we mostly don’t tie this to the specifics of LLMs.
Yeah, restricting the creation and dissemination of most AGI-related research is definitely a much harder ask. I can imagine a world that has an appetite for that kind of invasive regulation (if it is necessary), but it would probably require intervening steps to get there, including first regulating only the biggest players in the AGI race (which is a very popular idea across all political spectra in the western world).
My overall p(doom from AI by 2040) is about 70%, which shows pessimism on my part as well. But of course, that’s why I’m trying so hard. My ranking of “ways we survive” from most to least likely goes: Robust Governance Solutions > Sheer Dumb Luck > Robust Technical Solutions. So advocacy is where I spend my time.
In any case, a world that is more aware of the problem is one that is more likely to solve it by some means or another. I’m working to buy us some luck, so to speak.
Unfortunately I am not going to read this post now for prioritization reasons, but wow your introduction is so good, I feel very predicted by the explanation of what foom means and the “[b]ut before you go” which is exactly the point I thought about closing the tab
Just to be clear, your position is that 25 years from now when LLMs are trained using trillions of times as much compute and routinely doing task that take humans months to years that they will still be unable to run a business worth $1B?
I think your comment is poorly worded, in that you’re stating certain trend extrapolations as facts rather than hypotheses. But anyway, yes my position is that LLMs (including groups of LLMs) will be unable to autonomously write a business plan and then found a company and grow it to $1B/year revenue, all with zero human intervention, 25 years from now.
The thing I actually expect is “LLMs with lots of RL training of diverse gamelike environments and problem sets, and some algorithmic tweaks”. Do you not expect that to work, or just by the time it does work, it will have evolved sufficiently beyond the current LLM paradigm, the resulting model will be better thought of as a new kind of thing?
I would include “constructivist learning” in your list, but I agree that LLMs seem capable of this. By “constructivist learning” I mean a scientific process where the learning conceives of an experiment on the world, tests the idea by acting on the world, and then learns from the result. A VLA model with incremental learning seems close to this. RL could be used for the model update, but I think for ASI we need learnings from real-world experiments.
I think this is a super important post. Thanks for publishing it!
One question that occurred to me while reading:
You assume that we will have a massive compute overhang once we have this new architecture. Is there a reason to expect that GPUs would remain useful? Or should we expect that a new architecture that’s sufficiently far away from the DL paradigm would actually need some new type of hardware? I really don’t know the answer to this so would be cool if you could shed some light on it. I guess if efficiency gains are sufficiently large with a new architecture then this becomes somewhat moot.
I don’t think GPUs would be the best of all possible chip designs for the next paradigm, but I expect they’ll work well enough (after some R&D on the software side, which I expect would be done early on, during the “seemingly irrelevant” phase, see §1.8.1.1). It’s not like any given chip can run one and only one algorithm. Remember, GPUs were originally designed for processing graphics :) And people are already today running tons of AI algorithms on GPUs that are not deep neural networks (random example).
I concur with that sentiment. GPUs hit a sweet spot between compute efficiency and algorithmic flexibility. CPUs are more flexible for arbitrary control logic, and custom ASICs can improve compute efficiency for a stable algorithm, but GPUs are great for exploring new algorithms where SIMD-style control flows exist (SIMD=single instruction, multiple data).
My expectation is that it’d be possible to translate any such architecture into a format that would efficiently run on GPUs/TPUs with some additional work, even if its initial definition would be e. g. neurosymbolic.
Though I do think it’s an additional step that the researchers would need to think of and execute, which might delay the doom for years (if it’s too inefficient in its initial representation).
Isn’t this because humans have a hard coded “language instinct”?
It sounds like you’re suggesting that inventing grammar is the convergent result of a general competency?
There are some caveats, but more-or-less, yeah. E.g. the language-processing parts of the cortex look pretty much the same as every other part of the neocortex. E.g. some people talk about how language is special because it has “recursion”, but in fact we can also handle “recursion” perfectly well in vision (e.g. we can recognize a picture inside a picture), planning (e.g. we can make a plan that incorporates a sub-plan), etc.
Incidentally, isn’t this true of most humans?
Almost everyone in the economy has a manager who has the role of asigning tasks and keeping the larger project on track. Some humans seem to have the capability to found and run large scale projects without recourse to anyone but their own orientating and reasoning, but not most humans.
Yeah to some extent, although that can be a motivation problem as well as a capability problem. Depends on how large is the “large scale project”.
I think almost all humans can and do “autonomously” execute projects that are well beyond today’s LLMs. I picked a hard example (founding and growing a company to $1B/year revenue) just for clarity.
Random website says 10% of the USA workforce (and 50% of the global workforce!?) is self-employed.
I think a big difference between functional organizations and dysfunctional bureaucracies is that the employees at functional organizations are aware of the larger project and how their work fits into it, and want the larger project to succeed, and act accordingly. So even if those people have a manager, it’s kinda a different relationship.
I think it depends on the context. It’s the norm for employees in companies to have managers though as @Steven Byrnes said, this is partially for motivational purposes since the incentives of employees are often not fully aligned with those of the company. So this example is arguably more of an alignment than a capability problem.
I can think of some other examples of humans acting in highly autonomous ways:
To the best of my knowledge, most academics and PhD students are expected to publish novel research in a highly autonomous way.
Novelists can work with a lot of autonomy when writing a book (though they’re a minority).
There are also a lot of personal non-work goals like saving for retirement or raising kids which require high autonomy over a long period of time.
Small groups of people like a startup can work autonomously for years without going off the rails like a group of LLMs probably would after a while (e.g. the Claude bliss attractor).
This post provides a good overview of some topics I think need attention by the ‘AI policy’ people at national levels. AI policy (such as the US and UK AISI groups) has been focused on generative AI and recently agentic AI to understand near-term risks. Whether we’re talking LLM training and scaffolding advances, or a new AI paradigm, there is new risk when AI begins to learn from experiments in the world or reasoning about its own world model. In child development, imitation learning focuses on learning from examples, while constructivist learning focuses on learning by reflecting on interactions with the world. Constructivist learning is, I expect, key to push past AGI to ASI and caries obvious risks to alignment beyond imitation learning.
In general, I expect something LLM-like (i.e. transformer models or an improved derivative) to be able to reach ASI with a proper learning-by-doing structure. But I also expect ASI could find and implement a more efficient intelligence algorithm once ASI exists.
This paragraph tries to provide some data for a probability estimate of this point. AI as a field has been around at least since the Dartmouth conference in 1956. In this time we’ve had Eliza, Deep Blue, Watson, and now transformer-based models including OpenAI o3-pro. In support of Steven’s position, one could note that AI research publications are much higher now that during the previous 70 years, but at the same time many AI ideas have been explored and the current best results are with models based on the 8-year-old “Attention is all you need” paper. To get a sense for the research rate, we can note that the doubling time for AI/ML research papers per month was about 2 years between 1994 and 2023 according to this Nature paper. Hence, every 2 years we have about as many papers as created in the last 70 years. I don’t expect this doubling can continue forever, but certainly many new ideas are being explored now. If a ‘simple model’ for AI exists and it’s discovery is, say, randomly positioned on a given AI/ML research paper published between 1956 and ASI achievement then one could estimate the probability of the paper’s position using this simplistic research model. If ASI is only 6 years out and the doubling every 2 years continues, then almost 90% of the AI/ML research papers before ASI are still in the future. Even though many of these papers are LLM focused, there is still active work in alternative areas. But even though the foundational paper for ASI may yet be in our future, I would expect something like a ‘complex’ ML model will win out (for example, Yann LeCun’s ideas involve differentiable brain modules). And the solution may or may not be more compute-intensive than current models. The brain compute estimates vary widely and the human brain has been optimized by evolution for many generations. In short, it seems reasonable to expect another key idea before ASI, but I would not expect it to be a simple model.
The brain is not simple and I don’t expect to find it simple once we understand how it works.
There is an incoherence in these sections: you justify the existence of a “core of intelligence” simpler than LLMs by pointing at brains that are messier than LLMs.
I too believe that there will be a future paradigm vastly more efficient than LLMs + RLVR.
This is part of the reason that I believe “AI Governance” would actually shorten timelines! Instead of spending vast amounts of money training bigger models AGI labs would be forced to innovate due to the compute limit...
Conclusion from reading this:
My modal scenario in which LLMs become dangerously super intelligent is one where language is a good enough platform to think, memory use is like other tool use (so an LLM can learn to use memory well, enabling eg continuous on the job learning), and verification is significantly easier than generation (allowing a self improvement cycle in training).
But perhaps actual existing efforts to hype up LLMs are helping? I am sympathetic to François Chollet’s position:
This post inspires for me two lines of thought.
If we’re thinking of the computing / training effort to get to that point “from scratch”, how much can we include? I have Newton’s “standing on the shoulders of giants” quote in mind here. Do we include the effort necessary to build the external repositories of knowledge and organizational structures of society that make it possible to build these $1B/year companies within a modern human lifetime and with our individually computationally-limited human brains? Do we expect the “brain-like” (in terms of computational leanness) AGI to piggy-back on human structures (which maybe brings it closer to LLM-like imitation machines) or essentially invent their own society; in the latter case, are their potential weaknesses within this organization, in the same way that collective action in humans is hard?
The second line is about learning speed and wall-clock time. Of course AI can communicate and compute orders of magnitude faster than humans, but there are other limiting factors to learning rate. At some point, the AI has to go beyond the representations that can be found or simulated within the digital world and get its own data / do its own experiments in the outside world. Then, it has to deal with the inevitable latencies of the real world: the time between an intervention and the response you can learn from, that can be rather long depending what natural or human system you’re studying.
Yup, I addressed that in §1.8.1:
It continues on (and see also §3.2 here). I agree that ASI won’t already know things about the world that it has no way of knowing. So what? Well, I dunno, I’m not sure why you brought it up. I’m guessing we have some disagreements about e.g. how quickly and easily an ASI could wipe out humans and survive on its own? Or were you getting at something else?
Maybe the problem is that we don’t have a good metaphor for what the path for “rapidly shooting past human-level capability” is like in a general sense, rather than on a specific domain.
One domain-specific metaphor you mention is AlphaZero, but games like chess are an unusual domain of learning for the AI, because it doesn’t need any external input beyond the rules of the game and objective, and RL can proceed just by the program playing against itself. It’s not clear to me how we can generalize the AlphaZero learning curve to problems that are not self-contained games like that, where the limiting factor may not be computing power or memory, but just the availability (and rate of acquisition) of good data to do RL on.
In §1.8.1 I also mentioned going from zero to beyond-world-expert-level understanding of cryptocurrency over the course of 24 hours spent reading up on the topic and all its prerequisites, and playing with the code, etc.
And in §3.2 here I talked about a hypothetical AI that, by itself, is running the equivalent of the entire global human R&D enterprise, i.e. the AI is running hundreds of thousands of laboratory experiments simultaneously, 24 hours a day, publishing millions of papers a year, and synthesizing brilliant new insights in every field at once.
Can we agree that those examples would be “rapidly shooting past human-level capability”, and would not violate any laws of probability by magically acquiring knowledge that it has no way to know?
With regards to the super-scientist AI (the global human R&D equivalent), wouldn’t we see it coming based on the amount of resources it would need to hire? Are you claiming that it could reach the required AGI capacity in its “brain in a box in a basement” state and only after scale up in terms of resource use? The part I’m most skeptical about remains this idea that the resource use to get to human-level performance is minimal if you just find the right algorithm, because at least in my view it neglects the evaluation step in learning that can be resource intensive from the start and maybe can’t be done “covertly”.
---
That said, I want to stress that I agree with the conclusion:
But then, if AI researchers believe a likely scenario is:
Does that imply that the people who work on technical alignment, or at least their allies, need to also put effort to “win the race” for AGI? It seems the idea that “any small group could create this with no warning” could motivate acceleration in that race even from people who are well-meaning in terms of alignment.
Farming does have a straightforward connection to techniques used by hunter-gatherers to gather plants more effectively. From page 66 of “Against the Grain: A Deep History of the Earliest States” by James C. Scott: