OK, thanks. So the deep layers in the human brain still learn, just slowly / less data-efficiently, compared to the deep layers in LLMs.
Doesn’t this prove too much though? It sounds like you are arguing that, in general, the deeper neurons in human brains need more datapoints of experience to learn anything compared to the deeper neurons in LLMs. Like, it sounds like you are saying that backprop is just a superior learning algorithm, that more quickly penetrates updates to all the deep weights compared to the more local process the brain uses.
But in practice humans seem to be more data-efficient than LLMs.
One difference is that we’re not just doing feedforward learning; one of my aforementioned hypotheses (attention [1] causes cognitive patterns far from sensory data to interact with each other, improving their coherence) points at a way that learning can effectively progress even if the connection to immediate sensory prediction grows tenuous for rare stimuli.
That’s an example of a way we could be more more sample-efficient than a feedforward learner, even if the latter ends up with some parts more ruthlessly optimized within their context.
[1] Human attention, not to be confused with transformer attention.
OK, thanks. So the deep layers in the human brain still learn, just slowly / less data-efficiently, compared to the deep layers in LLMs.
Doesn’t this prove too much though? It sounds like you are arguing that, in general, the deeper neurons in human brains need more datapoints of experience to learn anything compared to the deeper neurons in LLMs. Like, it sounds like you are saying that backprop is just a superior learning algorithm, that more quickly penetrates updates to all the deep weights compared to the more local process the brain uses.
But in practice humans seem to be more data-efficient than LLMs.
One difference is that we’re not just doing feedforward learning; one of my aforementioned hypotheses (attention [1] causes cognitive patterns far from sensory data to interact with each other, improving their coherence) points at a way that learning can effectively progress even if the connection to immediate sensory prediction grows tenuous for rare stimuli.
That’s an example of a way we could be more more sample-efficient than a feedforward learner, even if the latter ends up with some parts more ruthlessly optimized within their context.
[1] Human attention, not to be confused with transformer attention.