I think the “cerebellum as faster feed-forward distillation of recurrent cortex” is an interesting possibility, but the cortex also does distillation itself through hippocampal relay, has fast feedforward modes, and so I recently started putting more likelihood in the idea that the cerebellum is instead part of the learning system that helps train the cortex, in particular assisting with historical credit assignment by learning some approximate inversion TD style or otherwise.
There are a number of interesting general proposals in the “how the brain implements backprop through time” literature, and some of the more interesting recent ones involve the combination of diffuse non specific reward signals (ie dopamine and serotonin projections) and specific learned inversions working together to provide BP quality credit assignment or possibly even better (as you aren’t constrained to a 1st order gradient approximation).
All that said, the brain is definitely redundant, and the cortex implements reasonably powerful UL all on it’s own (eg hierarchical sparse coding), but it’s pretty clear it probably also employs something at least as good (or likely better) than gradient backprop, and learned inversions are a leading candidate. And as they are trained through a tight timing sensitive distillation process on a large data set it makes sense to use a big feedfoward layer, and this also explains why the lowest sensory cortex modules (ie V1) are the only cortical regions that lack supporting cerebellum modules .
I should point out though that these aren’t even necessarily distinct computations—because both involve learning a form of predictive temporal distillation—whether it’s predicting the output or predicting some training signal of the output.
I think the “cerebellum as faster feed-forward distillation of recurrent cortex” is an interesting possibility, but the cortex also does distillation itself through hippocampal relay, has fast feedforward modes, and so I recently started putting more likelihood in the idea that the cerebellum is instead part of the learning system that helps train the cortex, in particular assisting with historical credit assignment by learning some approximate inversion TD style or otherwise.
There are a number of interesting general proposals in the “how the brain implements backprop through time” literature, and some of the more interesting recent ones involve the combination of diffuse non specific reward signals (ie dopamine and serotonin projections) and specific learned inversions working together to provide BP quality credit assignment or possibly even better (as you aren’t constrained to a 1st order gradient approximation).
All that said, the brain is definitely redundant, and the cortex implements reasonably powerful UL all on it’s own (eg hierarchical sparse coding), but it’s pretty clear it probably also employs something at least as good (or likely better) than gradient backprop, and learned inversions are a leading candidate. And as they are trained through a tight timing sensitive distillation process on a large data set it makes sense to use a big feedfoward layer, and this also explains why the lowest sensory cortex modules (ie V1) are the only cortical regions that lack supporting cerebellum modules .
I should point out though that these aren’t even necessarily distinct computations—because both involve learning a form of predictive temporal distillation—whether it’s predicting the output or predicting some training signal of the output.