Weight updating continual learning needs to be both LoRA weights and data that can be used to retrain LoRA weights on top of a different model (possibly also making use of the old model+LoRA as a teacher). It needs to be LoRA rather than full model updating to preserve batch processing of requests from many individual users. And there needs to be data to train LoRA on top of a new model, or else all adaptation/learning is lost on every (major) update of the underlying model.
Various memory/skill databases are already a thing in some form, and will be getting better, there’s not going to be something distinct enough to be worth announcing as “continual learning” in that space. Weight updating continual learning is much more plausibly the thing that can leapfrog incremental progress of tool-like memory, and so I think it’s weight updating that gets to be announced as “continual learning”. Though the data for retraining LoRA on top of a new underlying model could end up as largely the same thing as a tool-accessible memory database.
Weight updating continual learning needs to be both LoRA weights and data that can be used to retrain LoRA weights on top of a different model (possibly also making use of the old model+LoRA as a teacher). It needs to be LoRA rather than full model updating to preserve batch processing of requests from many individual users. And there needs to be data to train LoRA on top of a new model, or else all adaptation/learning is lost on every (major) update of the underlying model.
Various memory/skill databases are already a thing in some form, and will be getting better, there’s not going to be something distinct enough to be worth announcing as “continual learning” in that space. Weight updating continual learning is much more plausibly the thing that can leapfrog incremental progress of tool-like memory, and so I think it’s weight updating that gets to be announced as “continual learning”. Though the data for retraining LoRA on top of a new underlying model could end up as largely the same thing as a tool-accessible memory database.
I think that SGD isn’t sample efficient enough to solve continual learning