I believe that the biggest bottleneck for continual learning is data.
First, I am defining continual learning (CL) as extreme long-context modelling with particularly good in-context supervised and reinforcement learning. You seem to have a similar implied definition, given that Titans and Hope are technically sequence modelling architectures more than classical continual learning architectures.
Titans might already be capable of crudely performing CL as I defined it, but we wouldn’t know. The reason is that we haven’t trained it on data that looks like CL. The long-context data that we currently use looks like pdfs, books, and synthetically concatenated snippets. None of that data, if you saw a model producing it, would you consider to be CL. The data doesn’t contain failures, feedback, and an entity learning from them. If we just trained the architecture on (currently non-existent to the public) data that looks like CL, then I think we would have CL.
The obvious solution to this problem is to collect better data. This would be expensive, but the big players could probably afford it.
Another solution that I see is to bake a strong inductive bias into the architecture. If CL is an out-of-distribution behavior relative to the training data, then the best option is an architecture that “wants” to exhibit CL-like behavior. Taken to the extreme, such an architecture would exhibit CL-like behavior without any prior training at all. One example would be an “architecture” that just fine-tunes a sliding-window transformer on the stream of context. Of the current weight-based architectures, I think E2E-TTT is the closest to this vision, since it is essentially meta-learned fine-tuning.
The final solution is to use reinforcement learning instead of pretraining to get CL abilities. If getting high rewards necessitates CL, then we would expect RL to eventually bake in continual learning. The problem is that RL is just so costly and inefficient, and we lack open-ended environments with unhackable rewards.
I don’t think I understand. I think of the data for continuous learning as coming from deployment—sessions and evaluations of what’s worth learning/remembering. Are you referring to data appropriate for learning-to-learn in initial training?
I agree that that’s scarse. And it would be nice to have. The Transformers and Hope architectures need to learn-to-learn, and humans do too, to some extent. But to some extent, we have built-in emotional registers for what’s important to learn, what’s surprising and important. Loosely similar mechanisms might work for continuous learning.
Yes, I am referring to the lack of learning-to-learn data during initial training.
Your point that humans have built-in mechanisms for continual learning is similar to what I’m saying about inductive biases: if we don’t have the data to train continual learning into models, we need to build it into the architecture.
However, I think the ‘data’ from which humans learn during development (on-policy interactions with the environment with constant feedback and something like rewards) is much more aligned to continual learning than books and pdfs.
I believe that the biggest bottleneck for continual learning is data.
First, I am defining continual learning (CL) as extreme long-context modelling with particularly good in-context supervised and reinforcement learning. You seem to have a similar implied definition, given that Titans and Hope are technically sequence modelling architectures more than classical continual learning architectures.
Titans might already be capable of crudely performing CL as I defined it, but we wouldn’t know. The reason is that we haven’t trained it on data that looks like CL. The long-context data that we currently use looks like pdfs, books, and synthetically concatenated snippets. None of that data, if you saw a model producing it, would you consider to be CL. The data doesn’t contain failures, feedback, and an entity learning from them. If we just trained the architecture on (currently non-existent to the public) data that looks like CL, then I think we would have CL.
The obvious solution to this problem is to collect better data. This would be expensive, but the big players could probably afford it.
Another solution that I see is to bake a strong inductive bias into the architecture. If CL is an out-of-distribution behavior relative to the training data, then the best option is an architecture that “wants” to exhibit CL-like behavior. Taken to the extreme, such an architecture would exhibit CL-like behavior without any prior training at all. One example would be an “architecture” that just fine-tunes a sliding-window transformer on the stream of context. Of the current weight-based architectures, I think E2E-TTT is the closest to this vision, since it is essentially meta-learned fine-tuning.
The final solution is to use reinforcement learning instead of pretraining to get CL abilities. If getting high rewards necessitates CL, then we would expect RL to eventually bake in continual learning. The problem is that RL is just so costly and inefficient, and we lack open-ended environments with unhackable rewards.
I don’t think I understand. I think of the data for continuous learning as coming from deployment—sessions and evaluations of what’s worth learning/remembering. Are you referring to data appropriate for learning-to-learn in initial training?
I agree that that’s scarse. And it would be nice to have. The Transformers and Hope architectures need to learn-to-learn, and humans do too, to some extent. But to some extent, we have built-in emotional registers for what’s important to learn, what’s surprising and important. Loosely similar mechanisms might work for continuous learning.
Yes, I am referring to the lack of learning-to-learn data during initial training.
Your point that humans have built-in mechanisms for continual learning is similar to what I’m saying about inductive biases: if we don’t have the data to train continual learning into models, we need to build it into the architecture.
However, I think the ‘data’ from which humans learn during development (on-policy interactions with the environment with constant feedback and something like rewards) is much more aligned to continual learning than books and pdfs.