Continual learning might wake the world up to AGI, without yet bringing the dangers of AGI.
Pretraining gives shallow intelligence that is general, RL gives deep creative intelligence in a narrow skill, but it used to be very hard to make it work well for most skills. RL with pretrained models, which is RLVR, makes RL robustly applicable to a wide variety of narrow skills. But it still needs to be applied manually, the skills it trains are hand-picked before deployment, and so deep creative intelligence from RLVR remains jagged, compared to the more general shallow intelligence from pretraining.
Continual learning has now been vaguely announced for 2026 by both Anthropic and GDM. If it merely provides adaptation for AI instances to the current situation or job at the shallow level of pretraining, it might still be significantly more economically valuable than the current crop of models, leading to more visible job displacement, waking up the public and the politicians to more of the importance of AI. Yet it doesn’t necessarily make direct progress in automating RLVR or introducing some other way of turning the creativity of RL general, and so the AIs won’t necessarily get notably more dangerous than today at the level of disempowerment or extinction.
Huh, I think continual learning would be a pretty big deal w.r.t. AI danger. Can you say more about why it wouldn’t? Seems like it would dramatically increase horizon lengths for example.
The point is the distinction between pretraining and RL (in the level of capability), and between manual jagged RLVR and hypothetical general RL (in the generality of capability). I think observing Opus 4.5 and Gemini 3 Pro is sufficient to be somewhat confident that even at 2026 compute levels pretraining itself won’t be sufficient for AGI (it won’t train sufficiently competent in-context learning behavior to let AIs work around all their hobblings), while IMO gold medal results (even with DeepSeek-V3 model size) demonstrate that RLVR is strong enough to get superhuman capabilities in the narrow skills it gets applied to (especially when it’s 1-4T active param models rather than DeepSeek-V3). So in the current regime (until 2029-2031, when yet another level of compute becomes available) AGI requires some kind of general RL, and continual learning doesn’t necessarily enable it on its own, even if it becomes very useful for the purposes of on-the-job training of AI instances.
This is more of a claim that timelines don’t get shorter within the 2026-2028 window because of continual learning, even if it’s understood as something that significantly increases AI adoption and secures funding for 5+ GW training systems by 2028-2030 (as well as rouses the public via job displacement). That is, starting with timelines without continual learning (appearing as its own thing, rather than an aspect of AGI), where AGI doesn’t appear in 2026-2028, I think adding continual learning (on its own) doesn’t obviously give AGI either, assuming continual learning is not actual automated RLVR (AIs applying RLVR automatically to add new skills to themselves). After 2029-2031, there are new things that could be attempted, such as next word prediction RLVR, and enough time will pass that new ideas might get ready, so I’m only talking very near term.
After 2029-2031, there are new things that could be attempted, such as next word prediction RLVR, and enough time will pass that new ideas might get ready, so I’m only talking very near term.
As an aside, next word prediction RLVR has always struck me as a strange idea. If we’d like to improve the task of next token prediction, we know how to do that directly by scaling laws. That is, I’d be surprised if having the model think about the next token in a discrete space for N steps would beat making the model N times larger thinking in continuous space, since in the former case most of the computation of the forward pass is wasted. There are also practical difficulties; e.g. it’s bottlenecked on memory bandwidth and is harder to parallelize.
Next word prediction RLVR is massively more compute hungry per unit of data (than pretraining), so it’s both likely impractical at current levels of compute, and plausibly solves text data scarcity at 2028-2030 levels of compute if it’s useful. The benefit is generality of the objective, the same as with pretraining itself, compared to manual construction of RL environments for narrow tasks. Given pretraining vs. RLVR capability gap, it’s plausibly a big deal if it makes RL-level capabilities as general as the current shallow pretraining-level capabilities.
The fact that Łukasz Kaiser (transformer paper co-author, currently at OpenAI) is talking about it in Nov 2025 is strong evidence AI companies couldn’t yet rule out that it might work. The idea itself is obvious enough, but that’s less significant as evidence for its prospects.
I agree that it requires a lot of compute, but I think that misunderstands the objection. My claim is that for any level of compute, scaling parameters or training epochs using existing pretraining recipes will be more compute efficient than RLVR for the task of next token prediction. One reason is that by scaling models you can directly optimize the cross entropy objective through gradient descent, but with RLVR you have to sample intermediate tokens, and optimizing over these discrete tokens is difficult and inefficient. That being said, I could imagine there being some other objective besides next token prediction for which RLVR could have an advantage over pretraining. This is what I imagine Łukasz is working on.
I disagree that it will ‘wake the world up’ in a more meaningful way than generic capabilities progress and broader adoption (i.e. ‘no change in trend’), and my guess is that you’re overestimating the AI literacy of the broader public and policymakers / how able they are to distinguish particular advancements from a general trend of progress.
I know a university professor (‘Bob’) who offered students extra credit if they:
Asked ChatGPT who the world’s leading expert was in Bob’s field.
Corrected it if it named anyone other than Bob.
Sent him a link to the chat log.
Not only did he think we’d unlocked continual learning; he thought that we’d unlocked continual learning across instances. This was in February 2024.
(maybe you’re thinking ‘oh, he just expected them to train on the chats later’, but this is explicitly not true; he thought they were ‘teaching’ the model that he was the world’s leading expert in his field, within that instance, and iterated on prompting techniques that he instructed his students to use, based on ‘experiments’ he ran)
There are similar stories involving policymakers and other powerful people that are easy to find. I just think the causal chain of people’s awareness is only very vaguely related to specific advancements at all (e.g. reasoning models didn’t make more news among ‘normal people’ than ‘GPT 5’, even though you could watch a machine think! which is surely an incredible headline).
I’m thinking of simply greater salience, compared to a more bearish trajectory with no continual learning (where chatbots are the new Google but people aren’t losing jobs all over the place). If there are objective grounds for a public outcry, more people will pay more attention, including politicians. What they’ll do with that attention is unclear, but I think continual learning has the potential for bringing significantly more attention in 2027-2028 compared to its absence, without yet an existential catastrophe or a straightforwardly destructive “warning shot”.
I think I’m misunderstanding you, so I’m going to try and talk it through.
Ok, so there are three possible states for the near-future:
Advances in continual learning
Advances in non-continual learning capabilities
Progress stops
My impression is that neither of us think 3 is very likely. I think there are plenty of capability improvements that could lead to generic increased saliency (and that massive automation and economic upheaval is happening even if progress were to ~halt, which is the main ‘generically increase saliency’ lever). So continual learning doesn’t seem special in that way.
Of course, it is very special, because it’s one of the Known Missing Ingredients, but it’s pretty easy for me to imagine you get something like continual learning but it’s so bad that you could have more efficiently converted resources into capabilities by continuing to dump money into RL, which is the kind of counterfactual that it makes sense to me to reason against here. So if we’re comparing scenario 1 (advances in continual learning) with scenario 2 (some other capabilities), I don’t immediately see a difference in saliency or safety between the two.[1]
Then there’s this additional complication, which is that you expect early continual learning to be shallow, which sets it apart from [other capabilities], in that it’s likely to be safer (less relevant to the AGI tech tree), while having outsized economic value relative to other non-AGI relevant capabilities (again, strong continual learning is of course relevant to AGI, but you don’t really think weak continual learning is, and it’s the weak one that we’re trying to talk about / expect could happen soon).
I guess I just mostly hope we have a new name for the upcoming instantiation of continual learning that isn’t continual learning, as well as a detailed public understanding of the implementation, which will make it much easier to evaluate how relevant it might be to (for instance) automating RLVR. I am pretty spooked that anything even vaguely resembling continual learning is on the menu for the near future, since the strong version of it is among the most important components of the true torment nexus.
Ok, reasoning through this more makes your initial claim both more interesting and more plausible than I initially thought. Let me know if it looks like I’m still not getting it. Thanks!
Basically agree with this in the near term, though I do think in the longer term, especially in the 2030s, continual learning will bring the dangers of AGI, and probably will lead to faster takeoffs than purely LLM-based takeoff worlds.
But yes, for at least the next 5 years, continual learning will differentially wake up the world to AGI without bringing the dangers of AGI, but unlike many on here, I don’t expect it to lead to policy that lets us reduce x-risk from AI much for the reasons Anton Leicht states here, but in short form, even if accelerationist power declines, this doesn’t necessarily mean AI existential safety can take advantage of it, and AI safety money will decline as a percentage compared to money for various job protection lobbies, and while accelerationists won’t be able to defeat entire anti-AI bills, it will still remain easy for them to neuter AI safety bills enough to make the EV of politics for reducing existential risk either much less than technical AI safety, or even outright worthless/negative depending on the politics of AI.
There’s a Dwarkesh quote on continual learning that I really want to emphasize here:
“Solving” continual learning won’t be a singular one-and-done achievement. Instead, it will feel like solving in context learning. GPT-3 demonstrated that in context learning could be very powerful (its ICL capabilities were so remarkable that the title of the GPT-3 paper is ‘Language Models are Few-Shot Learners’). But of course, we didn’t “solve” in-context learning when GPT-3 came out—and indeed there’s plenty of progress still to be made, from comprehension to context length. I expect a similar progression with continual learning. Labs will probably release something next year which they call continual learning, and which will in fact count as progress towards continual learning. But human level continual learning may take another 5 to 10 years of further progress.
Yes. Right now, LLMs feel more like a tool than a mind or entity. Adding continual learning will make them feel more like humans, which is intuitively alarming. It will also broaden their deployment, another source of alarm. They’ll become more continuous like a human, instead of ephemeral ghost. More agentic behavior, as a result of improving competence by “learning on the job” (and other relevant improvements), will also push in that direction, making them seem intuitively more like humans. Humans are intuitively extremely dangerous. Weird alien versions of humans are intuitively even more alarming (if you’re not an AI enthusiast or engaged in a culture war with those pesky “doomers”).
I think this will make progress toward RSI. It will grow into a major unhobbling for agent competence in all areas. But it will be slower progress, because we’ll have bad, limited continual learning before we have really good human-like continual learning. So I think it will unlock the dangers of AGI, but at a slower pace that will give us a fighting chance to wake up and take alignment seriously, barely in time.
I’m thinking of next-gen LLM agents with continual as parahuman AI, systems that work roughly like human brains/minds, and work alongside humans.
Continual learning might wake the world up to AGI, without yet bringing the dangers of AGI.
Pretraining gives shallow intelligence that is general, RL gives deep creative intelligence in a narrow skill, but it used to be very hard to make it work well for most skills. RL with pretrained models, which is RLVR, makes RL robustly applicable to a wide variety of narrow skills. But it still needs to be applied manually, the skills it trains are hand-picked before deployment, and so deep creative intelligence from RLVR remains jagged, compared to the more general shallow intelligence from pretraining.
Continual learning has now been vaguely announced for 2026 by both Anthropic and GDM. If it merely provides adaptation for AI instances to the current situation or job at the shallow level of pretraining, it might still be significantly more economically valuable than the current crop of models, leading to more visible job displacement, waking up the public and the politicians to more of the importance of AI. Yet it doesn’t necessarily make direct progress in automating RLVR or introducing some other way of turning the creativity of RL general, and so the AIs won’t necessarily get notably more dangerous than today at the level of disempowerment or extinction.
Huh, I think continual learning would be a pretty big deal w.r.t. AI danger. Can you say more about why it wouldn’t? Seems like it would dramatically increase horizon lengths for example.
The point is the distinction between pretraining and RL (in the level of capability), and between manual jagged RLVR and hypothetical general RL (in the generality of capability). I think observing Opus 4.5 and Gemini 3 Pro is sufficient to be somewhat confident that even at 2026 compute levels pretraining itself won’t be sufficient for AGI (it won’t train sufficiently competent in-context learning behavior to let AIs work around all their hobblings), while IMO gold medal results (even with DeepSeek-V3 model size) demonstrate that RLVR is strong enough to get superhuman capabilities in the narrow skills it gets applied to (especially when it’s 1-4T active param models rather than DeepSeek-V3). So in the current regime (until 2029-2031, when yet another level of compute becomes available) AGI requires some kind of general RL, and continual learning doesn’t necessarily enable it on its own, even if it becomes very useful for the purposes of on-the-job training of AI instances.
This is more of a claim that timelines don’t get shorter within the 2026-2028 window because of continual learning, even if it’s understood as something that significantly increases AI adoption and secures funding for 5+ GW training systems by 2028-2030 (as well as rouses the public via job displacement). That is, starting with timelines without continual learning (appearing as its own thing, rather than an aspect of AGI), where AGI doesn’t appear in 2026-2028, I think adding continual learning (on its own) doesn’t obviously give AGI either, assuming continual learning is not actual automated RLVR (AIs applying RLVR automatically to add new skills to themselves). After 2029-2031, there are new things that could be attempted, such as next word prediction RLVR, and enough time will pass that new ideas might get ready, so I’m only talking very near term.
As an aside, next word prediction RLVR has always struck me as a strange idea. If we’d like to improve the task of next token prediction, we know how to do that directly by scaling laws. That is, I’d be surprised if having the model think about the next token in a discrete space for N steps would beat making the model N times larger thinking in continuous space, since in the former case most of the computation of the forward pass is wasted. There are also practical difficulties; e.g. it’s bottlenecked on memory bandwidth and is harder to parallelize.
Next word prediction RLVR is massively more compute hungry per unit of data (than pretraining), so it’s both likely impractical at current levels of compute, and plausibly solves text data scarcity at 2028-2030 levels of compute if it’s useful. The benefit is generality of the objective, the same as with pretraining itself, compared to manual construction of RL environments for narrow tasks. Given pretraining vs. RLVR capability gap, it’s plausibly a big deal if it makes RL-level capabilities as general as the current shallow pretraining-level capabilities.
The fact that Łukasz Kaiser (transformer paper co-author, currently at OpenAI) is talking about it in Nov 2025 is strong evidence AI companies couldn’t yet rule out that it might work. The idea itself is obvious enough, but that’s less significant as evidence for its prospects.
I agree that it requires a lot of compute, but I think that misunderstands the objection. My claim is that for any level of compute, scaling parameters or training epochs using existing pretraining recipes will be more compute efficient than RLVR for the task of next token prediction. One reason is that by scaling models you can directly optimize the cross entropy objective through gradient descent, but with RLVR you have to sample intermediate tokens, and optimizing over these discrete tokens is difficult and inefficient. That being said, I could imagine there being some other objective besides next token prediction for which RLVR could have an advantage over pretraining. This is what I imagine Łukasz is working on.
I disagree that it will ‘wake the world up’ in a more meaningful way than generic capabilities progress and broader adoption (i.e. ‘no change in trend’), and my guess is that you’re overestimating the AI literacy of the broader public and policymakers / how able they are to distinguish particular advancements from a general trend of progress.
I know a university professor (‘Bob’) who offered students extra credit if they:
Asked ChatGPT who the world’s leading expert was in Bob’s field.
Corrected it if it named anyone other than Bob.
Sent him a link to the chat log.
Not only did he think we’d unlocked continual learning; he thought that we’d unlocked continual learning across instances. This was in February 2024.
(maybe you’re thinking ‘oh, he just expected them to train on the chats later’, but this is explicitly not true; he thought they were ‘teaching’ the model that he was the world’s leading expert in his field, within that instance, and iterated on prompting techniques that he instructed his students to use, based on ‘experiments’ he ran)
There are similar stories involving policymakers and other powerful people that are easy to find. I just think the causal chain of people’s awareness is only very vaguely related to specific advancements at all (e.g. reasoning models didn’t make more news among ‘normal people’ than ‘GPT 5’, even though you could watch a machine think! which is surely an incredible headline).
I’m thinking of simply greater salience, compared to a more bearish trajectory with no continual learning (where chatbots are the new Google but people aren’t losing jobs all over the place). If there are objective grounds for a public outcry, more people will pay more attention, including politicians. What they’ll do with that attention is unclear, but I think continual learning has the potential for bringing significantly more attention in 2027-2028 compared to its absence, without yet an existential catastrophe or a straightforwardly destructive “warning shot”.
I think I’m misunderstanding you, so I’m going to try and talk it through.
Ok, so there are three possible states for the near-future:
Advances in continual learning
Advances in non-continual learning capabilities
Progress stops
My impression is that neither of us think 3 is very likely. I think there are plenty of capability improvements that could lead to generic increased saliency (and that massive automation and economic upheaval is happening even if progress were to ~halt, which is the main ‘generically increase saliency’ lever). So continual learning doesn’t seem special in that way.
Of course, it is very special, because it’s one of the Known Missing Ingredients, but it’s pretty easy for me to imagine you get something like continual learning but it’s so bad that you could have more efficiently converted resources into capabilities by continuing to dump money into RL, which is the kind of counterfactual that it makes sense to me to reason against here. So if we’re comparing scenario 1 (advances in continual learning) with scenario 2 (some other capabilities), I don’t immediately see a difference in saliency or safety between the two.[1]
Then there’s this additional complication, which is that you expect early continual learning to be shallow, which sets it apart from [other capabilities], in that it’s likely to be safer (less relevant to the AGI tech tree), while having outsized economic value relative to other non-AGI relevant capabilities (again, strong continual learning is of course relevant to AGI, but you don’t really think weak continual learning is, and it’s the weak one that we’re trying to talk about / expect could happen soon).
I guess I just mostly hope we have a new name for the upcoming instantiation of continual learning that isn’t continual learning, as well as a detailed public understanding of the implementation, which will make it much easier to evaluate how relevant it might be to (for instance) automating RLVR. I am pretty spooked that anything even vaguely resembling continual learning is on the menu for the near future, since the strong version of it is among the most important components of the true torment nexus.
Ok, reasoning through this more makes your initial claim both more interesting and more plausible than I initially thought. Let me know if it looks like I’m still not getting it. Thanks!
Although as I type this, I think I’m beginning to?
Basically agree with this in the near term, though I do think in the longer term, especially in the 2030s, continual learning will bring the dangers of AGI, and probably will lead to faster takeoffs than purely LLM-based takeoff worlds.
But yes, for at least the next 5 years, continual learning will differentially wake up the world to AGI without bringing the dangers of AGI, but unlike many on here, I don’t expect it to lead to policy that lets us reduce x-risk from AI much for the reasons Anton Leicht states here, but in short form, even if accelerationist power declines, this doesn’t necessarily mean AI existential safety can take advantage of it, and AI safety money will decline as a percentage compared to money for various job protection lobbies, and while accelerationists won’t be able to defeat entire anti-AI bills, it will still remain easy for them to neuter AI safety bills enough to make the EV of politics for reducing existential risk either much less than technical AI safety, or even outright worthless/negative depending on the politics of AI.
There’s a Dwarkesh quote on continual learning that I really want to emphasize here:
Yes. Right now, LLMs feel more like a tool than a mind or entity. Adding continual learning will make them feel more like humans, which is intuitively alarming. It will also broaden their deployment, another source of alarm. They’ll become more continuous like a human, instead of ephemeral ghost. More agentic behavior, as a result of improving competence by “learning on the job” (and other relevant improvements), will also push in that direction, making them seem intuitively more like humans. Humans are intuitively extremely dangerous. Weird alien versions of humans are intuitively even more alarming (if you’re not an AI enthusiast or engaged in a culture war with those pesky “doomers”).
I wrote about this in A country of alien idiots in a datacenter: AI progress and public alarm, focusing on impacts on public opinion. I wrote about the technical side more in LLM AGI will have memory, and memory changes alignment.
I think this will make progress toward RSI. It will grow into a major unhobbling for agent competence in all areas. But it will be slower progress, because we’ll have bad, limited continual learning before we have really good human-like continual learning. So I think it will unlock the dangers of AGI, but at a slower pace that will give us a fighting chance to wake up and take alignment seriously, barely in time.
I’m thinking of next-gen LLM agents with continual as parahuman AI, systems that work roughly like human brains/minds, and work alongside humans.