Continual learning as some sort of infinite context with little degradation in quality is very different from solving the “first day on the job” problem. I think the latter would need RL targeted at obscure things a given agent instance deals with, while the former is more likely to arrive in 2026-2027. If this is the case, even revenue growth from “continual learning” (in this initial form) might be modest, let alone its impact on the AGI timelines.
And then the next thing might be more general RL at training time (with something like next word prediction RLVR), not dependent on manually crafted tasks and specialized RL environments, solving the jaggedness of “manual” RLVR for everything but the actual “first day on the job” aspect. This would essentially leapfrog pretraining quality in a way that can’t be done using the current methods with the available natural text data, but won’t have any impact on the quality of test-time continual learning (other than via making in-context learning better in general). Such models would be more well-rounded at everything standard, but will still fail to get as good as humans at adapting to specific jobs, at developing deep skills with no general applicability or those needed to solve a particular difficult problem.
Many ideas in the vicinity of continual learning by design don’t involve full fine-tuning where every weight of the model changes, and those that do could probably still be made almost as capable with LoRA. Given the practical importance of only updating maybe 100x fewer parameters than the full model (or less) to keep batched processing of user requests working the same as with KV-cache, I think the first methods dubbed “continual learning” will be doing exactly this.
Maybe at some point there will be an “agent swarm” use case where all the requests in a batch are working on the same problem for the same user, and so their full model can keep being updated in sync for that single problem. But this seems sufficiently niche that it’s not the first thing that gets deployed, and the method for continual learning needs to involve full weights updating at all for this to be relevant.
Continual learning as some sort of infinite context with little degradation in quality is very different from solving the “first day on the job” problem. I think the latter would need RL targeted at obscure things a given agent instance deals with, while the former is more likely to arrive in 2026-2027. If this is the case, even revenue growth from “continual learning” (in this initial form) might be modest, let alone its impact on the AGI timelines.
And then the next thing might be more general RL at training time (with something like next word prediction RLVR), not dependent on manually crafted tasks and specialized RL environments, solving the jaggedness of “manual” RLVR for everything but the actual “first day on the job” aspect. This would essentially leapfrog pretraining quality in a way that can’t be done using the current methods with the available natural text data, but won’t have any impact on the quality of test-time continual learning (other than via making in-context learning better in general). Such models would be more well-rounded at everything standard, but will still fail to get as good as humans at adapting to specific jobs, at developing deep skills with no general applicability or those needed to solve a particular difficult problem.
Another issue with continual learning is that it likely doesn’t have the efficiency of today’s cloud-based LLMs:
Many ideas in the vicinity of continual learning by design don’t involve full fine-tuning where every weight of the model changes, and those that do could probably still be made almost as capable with LoRA. Given the practical importance of only updating maybe 100x fewer parameters than the full model (or less) to keep batched processing of user requests working the same as with KV-cache, I think the first methods dubbed “continual learning” will be doing exactly this.
Maybe at some point there will be an “agent swarm” use case where all the requests in a batch are working on the same problem for the same user, and so their full model can keep being updated in sync for that single problem. But this seems sufficiently niche that it’s not the first thing that gets deployed, and the method for continual learning needs to involve full weights updating at all for this to be relevant.