Excellent post! I agree that this deserves more attention in the alignment community than it’s getting. Continuous learning of some sort seems inevitable, and like it breaks a lot of load-bearing assumptions in the current standard thinking. Alignment is one, and rate of progress is another; you mention both.
What’s most salient to me is that we don’t know how much or how good of CL would be enough to break those assumptions. It might take a lot or a little to be important.
I have become somewhat more optimistic about our ability to align continuously-learning models since writing LLM AGI will have memory (thanks for the prominent citation). But that leaves me not-very-optimistic still.
The other form of weight-based continual learning is just doing fine-tuning on carefully-selected bits of the model’s “experience”. This can be used to develo “skills”. This is subject to large interference problems, but it’s potentially pretty cheap. And it can potentially be applied as a removable LORA to keep capabilities intact (I’m not sure to what degree this would actually work).
You mention the extreme use to an organization of preserving its internal knowledge in weights vs. reloading it. I just want to emphasize that that would include solving Dwarkesh’s “perpetual first-day intern” problem; that tacit knowledge includes all of the knowledge of how this organization actually gets its stuff done. (And when it’s learned in weights, this should be thought of as skills as well as knowledge).
Excellent post! I agree that this deserves more attention in the alignment community than it’s getting. Continuous learning of some sort seems inevitable, and like it breaks a lot of load-bearing assumptions in the current standard thinking. Alignment is one, and rate of progress is another; you mention both.
What’s most salient to me is that we don’t know how much or how good of CL would be enough to break those assumptions. It might take a lot or a little to be important.
I have become somewhat more optimistic about our ability to align continuously-learning models since writing LLM AGI will have memory (thanks for the prominent citation). But that leaves me not-very-optimistic still.
The other form of weight-based continual learning is just doing fine-tuning on carefully-selected bits of the model’s “experience”. This can be used to develo “skills”. This is subject to large interference problems, but it’s potentially pretty cheap. And it can potentially be applied as a removable LORA to keep capabilities intact (I’m not sure to what degree this would actually work).
You mention the extreme use to an organization of preserving its internal knowledge in weights vs. reloading it. I just want to emphasize that that would include solving Dwarkesh’s “perpetual first-day intern” problem; that tacit knowledge includes all of the knowledge of how this organization actually gets its stuff done. (And when it’s learned in weights, this should be thought of as skills as well as knowledge).