I agree that because base models can have such broad pre-existing knowledge and skills, can be trained with RL to manage context and memories externally, and have a super-human context window/working memory, weight-based continual learning isn’t strictly necessary to get to some fairly powerful systems, probably able to automate much of white-collar work. But I also suspect that weight-based continual learning may not be that far off, and that it could offer some qualitative advantages leading to faster capability progress than we otherwise would have.
I recently wrote a post surveying some weight based continual learning research that you may find interesting.
I agree that because base models can have such broad pre-existing knowledge and skills, can be trained with RL to manage context and memories externally, and have a super-human context window/working memory, weight-based continual learning isn’t strictly necessary to get to some fairly powerful systems, probably able to automate much of white-collar work. But I also suspect that weight-based continual learning may not be that far off, and that it could offer some qualitative advantages leading to faster capability progress than we otherwise would have.