Steven: as someone who has read all your posts agrees with you on almost everything, this is a point where I have a clear disagreement with you. When I switched from neuroscience to doing ML full-time, some of the stuff I read to get up to speed was people theorizing about impossibly large (infinite or practically so) neural networks. I think that the literature on this does a pretty good job of establishing that, in the limit, neural networks can compute any sort of function. Which means that they can compute all the functions in a human brain, or a set of human brains. Meaning, it’s not a question of whether scaling CAN get us to AGI. It certainly can. It’s a question of when. There is inefficiency in trying to scale an algorithm which tries to brute force learn the relevant functions rather than have them hardcoded in via genetics. I think that you are right that there are certain functions the human brain does quite well that current SoTA LLMs do very poorly. I don’t think this means that scaling LLMs can’t lead to a point where the relevant capabilities suddenly emerge. I think we are already in a regime of substantial compute and data overhang for AGI, and that the thing holding us back is the proper design and integration of modules which emulate the functions of parts of the brain not currently well imitated by LLMs. Like the reward and valence systems of the basal ganglia, for instance. It’s still an open question to me whether we will get to AGI via scaling or algorithmic improvement. Imagine for a moment that I am correct that scaling LLMs could get us there, but also that a vastly more efficient system which borrows more functions from the human brain is possible. What might this scenario look like? Perhaps an LLM gets strong enough to, upon human prompting and with human assistance, analyze the computational neuroscience literature and open source code, and extract useful functions, and then do some combination of intuitively improve their efficiency and brute force test them in new experimental ML architectures. This is not so big a leap from what GPT-4 is capable of. I think that that’s plausibly even a GPT-5 level of skill. Suppose also that these new architectures can be added onto existing LLM base models, rather than needed the base model to be retrained from scratch. As some critical amount of discoveries accumulate, GPT-5 suddenly takes a jump forward in efficacy, enabling it to process the rest of the potential improvements much faster, and then it takes another big jump forward, and then is able to rapidly self-improve with no further need for studying existing published research. In such a scenario, we’d have a foom over the course of a few days which could take us by surprise and lead to a rapid loss of control. This is exactly why I think Astera’s work is risky, even though their current code seems quite harmless on its own. I think it is focused on (some of) the places where LLMs do poorly, but also that there’s nothing stopping the work from being effectively integrated with existing models for substantial capability gains. This is why I got so upset with Astera when I realized during my interview process with them that they were open-sourcing their code, and also when I carefully read through their code and saw great potential there for integrating it with more mainstream ML to the empowerment of both.
literature examples of the sort of thing I’m talking about with ‘enough scaling will eventually get us there’, even though I haven’t read this particular paper: https://arxiv.org/abs/2112.15577
Paul: I think you are making a valid point here. I think your point is (sadly) getting obscured by the fact our assumptions have shifted under our feet since the time when you began to make your point about slow vs fast takeoff.
I’d like to explain what I think the point you are right on is, and then try to describe how I think we need a new framing for the next set of predictions.
Several years ago, Eliezer and MIRI generally were frequently emphasizing the idea of a fast take-off that snuck up on us before the world had been much changed by narrow AI. You correctly predicted that the world would indeed be transformed in a very noticeable way by narrow AI before AGI. Eliezer in discussions with you has failed to acknowledge ways in which his views shifted from what they were ~10 years ago towards your views. I think this reflects poorly on him, but I still think he has a lot of good ideas, and made a lot of important predictions well in advance of other people realizing how important this was all going to be. As I’ve stated before, I often find my own views seeming to be located somewhere in-between your views and Eliezer’s wherever you two disagree.
I think we should acknowledge your point that the world being changed in a very noticeable way by AI before true AGI, just as you have acknowledged Eliezer’s point that once a runaway out-of-human-control human-out-of-the-loop recursive-self-improvement process gets started it could potentially proceed shockingly fast and lead to a loss of humanity’s ability to regain control of the resulting AGI even once we realized what is happening. [I say Eliezer’s point here, not to suggest that you disagreed with him on this point, but simply that he was making this a central part of his predictions from fairly early on.]
I think the framing we need now is: how can we predict, detect, and halt such a runaway RSI process before it is too late? This is important to consider from multiple angles. I mostly think that the big AI labs are being reasonably wary about this (although they certainly could do better). What concerns me more is the sort of people out in the wild who will take open source code and do dumb or evil stuff with it, Chaos-GPT-style, for personal gain or amusement. I think the biggest danger we face is that affordable open-source models seem to be lagging only a few years behind SotA models, and that the world is full of chaotic people who could (knowingly or not) trigger a runaway RSI process if such a thing is cheap and easy to do.
In such a strategic landscape, it could be crucially important to figure out how to:
a) slow down the progress of open source models, to keep dangerous runaway RSI from becoming cheap and easy to trigger
b) use SotA models to develop better methods of monitoring and preventing anyone outside a reasonably-safe-behaving org from doing this dangerous thing.
c) improving the ability of the reasonable orgs to self-monitor and notice the danger before they blow themselves up
I think that it does not make strategic sense to actively hinder the big AI labs. I think our best move is to help them move more safely, while also trying to build tools and regulation for monitoring the world’s compute. I do not think there is any feasible solution for this which doesn’t utilize powerful AI tools to help with the monitoring process. These AI tools could be along the lines of SotA LLMs, or something different like an internet police force made up of something like Conjecture’s CogEms. Or perhaps some sort of BCI or gene-mod upgraded humans (though I doubt we have time for this).
My view is that algorithmic progress, pointed to by neuroscience, is on the cusp of being discovered, and if those insights are published, will make powerful AGI cheap and available to all competent programmers everywhere in the world. With so many people searching, and the necessary knowledge so widely distributed, I don’t think we can count on keeping this under wraps forever. Rather than have these insights get discovered and immediately shared widely (e.g. by some academic eager to publish an exciting paper who didn’t realize the full power and implications of their discovery), I think it would be far better to have a safety-conscious lab discover these, have a way to safely monitor themselves to notice the danger and potential power of what they’ve discovered. They can then keep the discoveries secret and collaborate with other safety-conscious groups and with governments to set up the worldwide monitoring we need to prevent a rogue AGI scenario. Once we have that, we can move safely to the long reflection and take our time figuring out better solutions to alignment. [An important crux for me here is that I believe that if we have control of an AGI which we know is potentially capable of recursively self-improving beyond our bounds to control it, we can successfully utilize this AGI at its current level of ability without letting it self-improve. If someone convinced me that this was untenable, it would change my strategic recommendations.]
As you can see from this prediction market I made, a lot of people currently disagree with me. I expect this will be a different looking distribution a year from now.
Here’s an intuition pump analogy for how I’ve been thinking about this. Imagine that I, as someone with a background in neuroscience and ML was granted the following set of abilities. Would you bet that I, with this set of abilities, would be able to do RSI? I would.
Abilities that I would have if I were an ML model trying to self-improve:
Make many copies of myself, and checkpoints throughout the process.
Work at high speed and in parallel with copies of myself.
Read all the existing scientific literature that seemed potentially relevant.
Observe all the connections between my neurons, all the activations of my clones as I expose them to various stimuli or run them through simulations.
Ability to edit these weights and connections.
Ability to add neurons (up to a point) where they seemed most needed, connected in any way I see fit, initialized with whatever weights I choose.
Ability to assemble new datasets and build new simulations to do additional training with.
Ability to freeze some subsection of a clone’s model and thus more rapidly train the remaining unfrozen section.
Ability to take notes and write collaborative documents with my clones working in parallel with me.
Ok. Thinking about that set of abilities, doesn’t it seem like a sufficiently creative, intelligent, determined general agent could successfully self-improve? I think so. I agree it’s unclear where the threshold is exactly, and when a transformer-based ML model will cross that threshold. I’ve made a bet at ‘GPT-5’, but honestly I’m not certain. Could be longer. Could be sooner...
Sorry @the gears to ascension . I know your view is that it would be better for me to be quiet about this, but I think the benefits of speaking up in this case outweigh the potential costs.
Steven: as someone who has read all your posts agrees with you on almost everything, this is a point where I have a clear disagreement with you. When I switched from neuroscience to doing ML full-time, some of the stuff I read to get up to speed was people theorizing about impossibly large (infinite or practically so) neural networks. I think that the literature on this does a pretty good job of establishing that, in the limit, neural networks can compute any sort of function. Which means that they can compute all the functions in a human brain, or a set of human brains. Meaning, it’s not a question of whether scaling CAN get us to AGI. It certainly can. It’s a question of when. There is inefficiency in trying to scale an algorithm which tries to brute force learn the relevant functions rather than have them hardcoded in via genetics. I think that you are right that there are certain functions the human brain does quite well that current SoTA LLMs do very poorly. I don’t think this means that scaling LLMs can’t lead to a point where the relevant capabilities suddenly emerge. I think we are already in a regime of substantial compute and data overhang for AGI, and that the thing holding us back is the proper design and integration of modules which emulate the functions of parts of the brain not currently well imitated by LLMs. Like the reward and valence systems of the basal ganglia, for instance. It’s still an open question to me whether we will get to AGI via scaling or algorithmic improvement. Imagine for a moment that I am correct that scaling LLMs could get us there, but also that a vastly more efficient system which borrows more functions from the human brain is possible. What might this scenario look like? Perhaps an LLM gets strong enough to, upon human prompting and with human assistance, analyze the computational neuroscience literature and open source code, and extract useful functions, and then do some combination of intuitively improve their efficiency and brute force test them in new experimental ML architectures. This is not so big a leap from what GPT-4 is capable of. I think that that’s plausibly even a GPT-5 level of skill. Suppose also that these new architectures can be added onto existing LLM base models, rather than needed the base model to be retrained from scratch. As some critical amount of discoveries accumulate, GPT-5 suddenly takes a jump forward in efficacy, enabling it to process the rest of the potential improvements much faster, and then it takes another big jump forward, and then is able to rapidly self-improve with no further need for studying existing published research. In such a scenario, we’d have a foom over the course of a few days which could take us by surprise and lead to a rapid loss of control. This is exactly why I think Astera’s work is risky, even though their current code seems quite harmless on its own. I think it is focused on (some of) the places where LLMs do poorly, but also that there’s nothing stopping the work from being effectively integrated with existing models for substantial capability gains. This is why I got so upset with Astera when I realized during my interview process with them that they were open-sourcing their code, and also when I carefully read through their code and saw great potential there for integrating it with more mainstream ML to the empowerment of both.
literature examples of the sort of thing I’m talking about with ‘enough scaling will eventually get us there’, even though I haven’t read this particular paper: https://arxiv.org/abs/2112.15577
https://openreview.net/forum?id=HyGBdo0qFm
Paul: I think you are making a valid point here. I think your point is (sadly) getting obscured by the fact our assumptions have shifted under our feet since the time when you began to make your point about slow vs fast takeoff.
I’d like to explain what I think the point you are right on is, and then try to describe how I think we need a new framing for the next set of predictions.
Several years ago, Eliezer and MIRI generally were frequently emphasizing the idea of a fast take-off that snuck up on us before the world had been much changed by narrow AI. You correctly predicted that the world would indeed be transformed in a very noticeable way by narrow AI before AGI. Eliezer in discussions with you has failed to acknowledge ways in which his views shifted from what they were ~10 years ago towards your views. I think this reflects poorly on him, but I still think he has a lot of good ideas, and made a lot of important predictions well in advance of other people realizing how important this was all going to be. As I’ve stated before, I often find my own views seeming to be located somewhere in-between your views and Eliezer’s wherever you two disagree.
I think we should acknowledge your point that the world being changed in a very noticeable way by AI before true AGI, just as you have acknowledged Eliezer’s point that once a runaway out-of-human-control human-out-of-the-loop recursive-self-improvement process gets started it could potentially proceed shockingly fast and lead to a loss of humanity’s ability to regain control of the resulting AGI even once we realized what is happening. [I say Eliezer’s point here, not to suggest that you disagreed with him on this point, but simply that he was making this a central part of his predictions from fairly early on.]
I think the framing we need now is: how can we predict, detect, and halt such a runaway RSI process before it is too late? This is important to consider from multiple angles. I mostly think that the big AI labs are being reasonably wary about this (although they certainly could do better). What concerns me more is the sort of people out in the wild who will take open source code and do dumb or evil stuff with it, Chaos-GPT-style, for personal gain or amusement. I think the biggest danger we face is that affordable open-source models seem to be lagging only a few years behind SotA models, and that the world is full of chaotic people who could (knowingly or not) trigger a runaway RSI process if such a thing is cheap and easy to do.
In such a strategic landscape, it could be crucially important to figure out how to:
a) slow down the progress of open source models, to keep dangerous runaway RSI from becoming cheap and easy to trigger
b) use SotA models to develop better methods of monitoring and preventing anyone outside a reasonably-safe-behaving org from doing this dangerous thing.
c) improving the ability of the reasonable orgs to self-monitor and notice the danger before they blow themselves up
I think that it does not make strategic sense to actively hinder the big AI labs. I think our best move is to help them move more safely, while also trying to build tools and regulation for monitoring the world’s compute. I do not think there is any feasible solution for this which doesn’t utilize powerful AI tools to help with the monitoring process. These AI tools could be along the lines of SotA LLMs, or something different like an internet police force made up of something like Conjecture’s CogEms. Or perhaps some sort of BCI or gene-mod upgraded humans (though I doubt we have time for this).
My view is that algorithmic progress, pointed to by neuroscience, is on the cusp of being discovered, and if those insights are published, will make powerful AGI cheap and available to all competent programmers everywhere in the world. With so many people searching, and the necessary knowledge so widely distributed, I don’t think we can count on keeping this under wraps forever. Rather than have these insights get discovered and immediately shared widely (e.g. by some academic eager to publish an exciting paper who didn’t realize the full power and implications of their discovery), I think it would be far better to have a safety-conscious lab discover these, have a way to safely monitor themselves to notice the danger and potential power of what they’ve discovered. They can then keep the discoveries secret and collaborate with other safety-conscious groups and with governments to set up the worldwide monitoring we need to prevent a rogue AGI scenario. Once we have that, we can move safely to the long reflection and take our time figuring out better solutions to alignment. [An important crux for me here is that I believe that if we have control of an AGI which we know is potentially capable of recursively self-improving beyond our bounds to control it, we can successfully utilize this AGI at its current level of ability without letting it self-improve. If someone convinced me that this was untenable, it would change my strategic recommendations.]
As @Jed McCaleb said in his recent post, ‘The only way forward is through!’. https://www.lesswrong.com/posts/vEtdjWuFrRwffWBiP/we-have-to-upgrade
As you can see from this prediction market I made, a lot of people currently disagree with me. I expect this will be a different looking distribution a year from now.
https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg
Here’s an intuition pump analogy for how I’ve been thinking about this. Imagine that I, as someone with a background in neuroscience and ML was granted the following set of abilities. Would you bet that I, with this set of abilities, would be able to do RSI? I would.
Abilities that I would have if I were an ML model trying to self-improve:
Make many copies of myself, and checkpoints throughout the process.
Work at high speed and in parallel with copies of myself.
Read all the existing scientific literature that seemed potentially relevant.
Observe all the connections between my neurons, all the activations of my clones as I expose them to various stimuli or run them through simulations.
Ability to edit these weights and connections.
Ability to add neurons (up to a point) where they seemed most needed, connected in any way I see fit, initialized with whatever weights I choose.
Ability to assemble new datasets and build new simulations to do additional training with.
Ability to freeze some subsection of a clone’s model and thus more rapidly train the remaining unfrozen section.
Ability to take notes and write collaborative documents with my clones working in parallel with me.
Ok. Thinking about that set of abilities, doesn’t it seem like a sufficiently creative, intelligent, determined general agent could successfully self-improve? I think so. I agree it’s unclear where the threshold is exactly, and when a transformer-based ML model will cross that threshold. I’ve made a bet at ‘GPT-5’, but honestly I’m not certain. Could be longer. Could be sooner...
Sorry @the gears to ascension . I know your view is that it would be better for me to be quiet about this, but I think the benefits of speaking up in this case outweigh the potential costs.
oh, no worries, this part is obvious