many approaches for making continual learning work try to do it via various forms of intentionally sparsifying the gradient or somehow assigning neurons topics or such things, a narrower selectivity so that information can’t go everywhere and updates are localized to the relevant subcomponent. it works okay, and iirc there’s reason to believe the brain does the same thing. this is all from memory I haven’t updated in like 2 years so might be wrong, but I definitely have seen papers that attempted things like this. to the degree I’m remembering something real, it’s evidence for this actually being adaptive: you need to not update everything in order to not break your brain, and gradients updating everything is basically a bug—yes, things do propagate deeply, but preventing them from doing so is core to how you can learn new things without overwriting old ones. IIRC, anyway.
many approaches for making continual learning work try to do it via various forms of intentionally sparsifying the gradient or somehow assigning neurons topics or such things, a narrower selectivity so that information can’t go everywhere and updates are localized to the relevant subcomponent. it works okay, and iirc there’s reason to believe the brain does the same thing. this is all from memory I haven’t updated in like 2 years so might be wrong, but I definitely have seen papers that attempted things like this. to the degree I’m remembering something real, it’s evidence for this actually being adaptive: you need to not update everything in order to not break your brain, and gradients updating everything is basically a bug—yes, things do propagate deeply, but preventing them from doing so is core to how you can learn new things without overwriting old ones. IIRC, anyway.
Interesting! I’d love to see the info you saw.