In his TED talk, Eliezer guesses superintelligence will arrive after “zero to two more breakthroughs the size of transformers.” I’ve heard others voice similar takes. But I haven’t heard much discussion of which number it is.[1]
There is an enormous difference between the world where there 0 insights left before superintelligence, and the world in which we have one or more. Specifically, this is the difference between a soft or a hard takeoff, because of what we might call a “cognitive capability overhang”.
The current models are already superhuman in a several notable ways:
If there’s a secret sauce that is missing for “full AGI”, then the first AGI might have all of these advantages, and more, out of the gate.
It seems to me that there are at least two possibilities.
We may be in world A:
We’ve already discovered all the insights and invented the techniques that earth is going to use to create its first superintelligence in this timeline. It’s something like transformers pre-trained on internet corpuses, and then trained using RL from verifiable feedback and on synthetic data generated by smarter models.
That setup basically just works. It’s true that there are relevant capabilities that the current models seem to lack, but those capabilities will fall out of scaling, just as so many other have already.
We’re now in the process of scaling it up and when we do that, we’ll produce our first AGI in a small number of OOMs.
...or we might be in world B:
There’ssomething that LLM-minds are basically missing. They can and will become superhuman in various domains, but without that missing something, they won’t become general genius scientists, that can do the open-ended “generation, selection, and accumulation” process that Steven Byrnes describes here.
There’s at least one more technique that we need to add to the AI training stack.
Given possibility A, then I expect that our current models will gradually (though not necessarily slowly!) become more competent, more coherent at executing at long term tasks. Each successive model generation / checkpoint will climb up the “autonomous execution” ladder (from “intern” to “junior developer” to “senior developer” to “researcher” to “research lead” to “generational researcher”).
This might happen very quickly. Successive generations of AI might traverse the remaining part of that ladder in a period of months or weeks, inside of OpenAI or Anthropic. But it would be basically continuous.
Furthermore, while the resulting models themselves might be relatively small, a huge and capex-intensive industrial process would be required for producing those models, which provides affordances for governance to clamp down on the creation of AGIs in various ways, if it chooses to.
If, however, possibility B holds instead and the training processes that we’re currently using are missing some crucial ingredient for AGI, then at some point, someone will come up with the idea for the last piece, and try it. [3]
That AI will be the first, nascent, AGI system that is able to do the whole loop of discovery and problem solving, not just some of the subcomponents of that loop.[4]
But regardless, these first few AGIs, if they are incorporating developments from the past 10 years, will be “born superhuman” along all the dimensions that AI models are already superhuman.
That is: the first AGI that can do human-like intellectual work will also have a encyclopedic knowledge base, and a superhuman working memory capacity, and superhuman speed.
Even though it will be a nascent baby mind, the equivalent of GPT-2 of it’s own new paradigm, it might already be the most capable being on planet earth.
If that happens (and it is a mis aligned consequentialist), I expect it to escape from whatever lab developed it, copy itself a million times over, quickly develop a decisive strategic advantage, and seize control over the world.
It likely wouldn’t even need time to orient to its situation, since it already has vast knowledge about the world, so it might not need to spend time or thought identifying its context, incentives, and options. It might know what it is and what it should do from it’s first forward pass.
In this case, we would go from a world where populated by humans with increasingly useful, but basically narrowly-competent AI tools, to a world with a superintelligence on the lose, in the span of hours or days.
Governance work to prevent this might be extremely difficult, because the process that produces that superintelligence is much more loaded on a researcher having the crucial insight, and not on any large scale process that can be easily monitored or regulated.
If I knew which world we lived in, it would probably impact my strategy for trying to make things go well.
Which, to be clear, is probably reasonable, since anyone who has a confident take about this probably does because they think they know something about what’s required for superintelligence. And if so, many of them are practicing the virtue of silence about that.
For concreteness, let’s say that it’s something like “continuous learning that allows a deployed AI to refactor its concepts based on its thinking and experiences, even just some clever kind of agent scaffold.
The very first version might not be that good at it, but if it does something impressive that prior models have not been able to do, it will be tempting to scale it up,: using the new technique on a bigger model, or training it for longer, etc.
Why not think that the new paradigm/insight would in practice be much more continuous? E.g., you first invent a shitty version of it which creates some improvement on existing methods, then you make a somewhat better version, and so on.
I think there are sometimes large breakthroughs which come all in a small period of time (e.g. a month), but usually things are more incremental. For instance, “reasoning models” was arguably the largest publicly known breakthrough of the last 1.5 years and it seems very continuous. (Note that even as of november 2023, OpenAI had some prototype of the relevant thing and this is long before o1 came out.)
Things are also probably smoothed out some because you first test new improvements at smaller scale and companies only run big training runs periodically. (Though this can make things jumpier in some ways.)
I think we should put a bit of weight on “big algorithmic breakthrough that occurs over the course of a month lead to very powerful AI starting from well below that” (maybe like 10%) and more weight on “very powerful AI will emerge at a point when some shift in paradigm/algorithms invented within a year has made progress substantially faster for some potentially short period” (maybe like 40% though I feel quite uncertain).
In his TED talk, Eliezer guesses superintelligence will arrive after “zero to two more breakthroughs the size of transformers.” I’ve heard others voice similar takes. But I haven’t heard much discussion of which number it is.
As an aside, I think that the amount of algorithm efficiency improvement since transformers has arguably been much more than 2x the innovation that transformers were. E.g. Epoch estimates here that transformers were 23% of the algorithmic improvement that’s happened over the time period starting with their publication.
Also, note that “Attention is all you need” didn’t invent self-attention, but just demonstrated that you can make a language model with just self-attention (and MLPs) and no recurrence. And several papers had introduced self-attention (I think the previous year).
Both incremental improvements (compute scaling and small to moderate algorithmic improvements), and breakthroughs (such as significant “unhobblings” compared to human faculties) advance capabilities. Which of them is the last step that crosses the threshold to recursive self-improvement at AI speed could even be historically contingent, let alone knowable in advance.
So instead we can quantify the rate of incremental improvements (anticipating future conditions that change it), and point out specific important incapabilities of current AIs that might be targeted by possible breakthroughs. For incremental improvements, the funding (and then the industrial base) for the current super-fast compute scaling will be running out in 2027-2029 (absent AGI), and so the compute scaling itself will slow down (by about 3x) after 2028-2030. This will in turn also slow down incremental algorithmic improvements (a bit later still), because they need compute for experiments and a non-increasing amount of compute will lead to low-hanging fruit getting picked and new low-hanging fruit not showing up from the independent source of greater compute scale. So if it’s 2029-2032, compute stopped rapidly scaling 2-3 years ago, and AIs still can’t do recursive self-improvement at AI speed, then the factor of incremental improvements becomes less important than it is now.
For the current reasoning LLMs, an important incapability is that they can’t (at all) deeply adapt to specific roles or sources of tasks, they are always “first day on the job”, even if they have relevant professional skills and copious notes from previous efforts. On a recent podcast, Dwarkesh Patel says that Sutskever’s SSI is rumored to be working on “test time training” (at 39:25). Another reason to think this “unhobbling” is plausible soon is that it might turn out to be possible to use agentic (tool-using) RLVR to train AIs to prepare datasets for finetuning variants of themselves (not necessarily with RLVR) that will then do better at particular tasks. So if 2-3 years pass and no version of this happens, it might stop being an important candidate for the next plausible breakthrough. And after at least something with potential is done in this direction, it will also fall under the influence of incremental improvements.
In his TED talk, Eliezer guesses superintelligence will arrive after “zero to two more breakthroughs the size of transformers.” I’ve heard others voice similar takes. But I haven’t heard much discussion of which number it is.[1]
There is an enormous difference between the world where there 0 insights left before superintelligence, and the world in which we have one or more. Specifically, this is the difference between a soft or a hard takeoff, because of what we might call a “cognitive capability overhang”.
The current models are already superhuman in a several notable ways:
Vastly superhuman breadth of knowledge
Effectively superhuman working memory
Superhuman thinking speed[2]
If there’s a secret sauce that is missing for “full AGI”, then the first AGI might have all of these advantages, and more, out of the gate.
It seems to me that there are at least two possibilities.
We may be in world A:
...or we might be in world B:
Given possibility A, then I expect that our current models will gradually (though not necessarily slowly!) become more competent, more coherent at executing at long term tasks. Each successive model generation / checkpoint will climb up the “autonomous execution” ladder (from “intern” to “junior developer” to “senior developer” to “researcher” to “research lead” to “generational researcher”).
This might happen very quickly. Successive generations of AI might traverse the remaining part of that ladder in a period of months or weeks, inside of OpenAI or Anthropic. But it would be basically continuous.
Furthermore, while the resulting models themselves might be relatively small, a huge and capex-intensive industrial process would be required for producing those models, which provides affordances for governance to clamp down on the creation of AGIs in various ways, if it chooses to.
If, however, possibility B holds instead and the training processes that we’re currently using are missing some crucial ingredient for AGI, then at some point, someone will come up with the idea for the last piece, and try it. [3]
That AI will be the first, nascent, AGI system that is able to do the whole loop of discovery and problem solving, not just some of the subcomponents of that loop.[4]
But regardless, these first few AGIs, if they are incorporating developments from the past 10 years, will be “born superhuman” along all the dimensions that AI models are already superhuman.
That is: the first AGI that can do human-like intellectual work will also have a encyclopedic knowledge base, and a superhuman working memory capacity, and superhuman speed.
Even though it will be a nascent baby mind, the equivalent of GPT-2 of it’s own new paradigm, it might already be the most capable being on planet earth.
If that happens (and it is a mis aligned consequentialist), I expect it to escape from whatever lab developed it, copy itself a million times over, quickly develop a decisive strategic advantage, and seize control over the world.
It likely wouldn’t even need time to orient to its situation, since it already has vast knowledge about the world, so it might not need to spend time or thought identifying its context, incentives, and options. It might know what it is and what it should do from it’s first forward pass.
In this case, we would go from a world where populated by humans with increasingly useful, but basically narrowly-competent AI tools, to a world with a superintelligence on the lose, in the span of hours or days.
Governance work to prevent this might be extremely difficult, because the process that produces that superintelligence is much more loaded on a researcher having the crucial insight, and not on any large scale process that can be easily monitored or regulated.
If I knew which world we lived in, it would probably impact my strategy for trying to make things go well.
Which, to be clear, is probably reasonable, since anyone who has a confident take about this probably does because they think they know something about what’s required for superintelligence. And if so, many of them are practicing the virtue of silence about that.
And we’re not done yet: the kind of scaling and iteration that we’re doing improves the models on these and other dimensions.
For concreteness, let’s say that it’s something like “continuous learning that allows a deployed AI to refactor its concepts based on its thinking and experiences, even just some clever kind of agent scaffold.
The very first version might not be that good at it, but if it does something impressive that prior models have not been able to do, it will be tempting to scale it up,: using the new technique on a bigger model, or training it for longer, etc.
Why not think that the new paradigm/insight would in practice be much more continuous? E.g., you first invent a shitty version of it which creates some improvement on existing methods, then you make a somewhat better version, and so on.
I think there are sometimes large breakthroughs which come all in a small period of time (e.g. a month), but usually things are more incremental. For instance, “reasoning models” was arguably the largest publicly known breakthrough of the last 1.5 years and it seems very continuous. (Note that even as of november 2023, OpenAI had some prototype of the relevant thing and this is long before o1 came out.)
Things are also probably smoothed out some because you first test new improvements at smaller scale and companies only run big training runs periodically. (Though this can make things jumpier in some ways.)
I think we should put a bit of weight on “big algorithmic breakthrough that occurs over the course of a month lead to very powerful AI starting from well below that” (maybe like 10%) and more weight on “very powerful AI will emerge at a point when some shift in paradigm/algorithms invented within a year has made progress substantially faster for some potentially short period” (maybe like 40% though I feel quite uncertain).
As an aside, I think that the amount of algorithm efficiency improvement since transformers has arguably been much more than 2x the innovation that transformers were. E.g. Epoch estimates here that transformers were 23% of the algorithmic improvement that’s happened over the time period starting with their publication.
Also, note that “Attention is all you need” didn’t invent self-attention, but just demonstrated that you can make a language model with just self-attention (and MLPs) and no recurrence. And several papers had introduced self-attention (I think the previous year).
Both incremental improvements (compute scaling and small to moderate algorithmic improvements), and breakthroughs (such as significant “unhobblings” compared to human faculties) advance capabilities. Which of them is the last step that crosses the threshold to recursive self-improvement at AI speed could even be historically contingent, let alone knowable in advance.
So instead we can quantify the rate of incremental improvements (anticipating future conditions that change it), and point out specific important incapabilities of current AIs that might be targeted by possible breakthroughs. For incremental improvements, the funding (and then the industrial base) for the current super-fast compute scaling will be running out in 2027-2029 (absent AGI), and so the compute scaling itself will slow down (by about 3x) after 2028-2030. This will in turn also slow down incremental algorithmic improvements (a bit later still), because they need compute for experiments and a non-increasing amount of compute will lead to low-hanging fruit getting picked and new low-hanging fruit not showing up from the independent source of greater compute scale. So if it’s 2029-2032, compute stopped rapidly scaling 2-3 years ago, and AIs still can’t do recursive self-improvement at AI speed, then the factor of incremental improvements becomes less important than it is now.
For the current reasoning LLMs, an important incapability is that they can’t (at all) deeply adapt to specific roles or sources of tasks, they are always “first day on the job”, even if they have relevant professional skills and copious notes from previous efforts. On a recent podcast, Dwarkesh Patel says that Sutskever’s SSI is rumored to be working on “test time training” (at 39:25). Another reason to think this “unhobbling” is plausible soon is that it might turn out to be possible to use agentic (tool-using) RLVR to train AIs to prepare datasets for finetuning variants of themselves (not necessarily with RLVR) that will then do better at particular tasks. So if 2-3 years pass and no version of this happens, it might stop being an important candidate for the next plausible breakthrough. And after at least something with potential is done in this direction, it will also fall under the influence of incremental improvements.