Let me take a slightly different example: echolocation.
Bats can detect differences in period as short as 10 nanoseconds. Neuronal spiking maxes out around 100 Hz. So the solution can’t just be as simple as “throw more energy and compute at it”. It’s a question of “design clever circuitry that’s as close as possible to theoretical limits on optimality”.
Similarly, the brain being very efficient increases the probability I assign to “it is doing something non-(practically-)isomorphic to feed-forward ANNs”. Maybe it’s hijacking recurrency in a way that scales far more effectively with parameter size than we can ever hope to create with transformers.
But I notice I am confused and will continue to think on it.
Let me take a slightly different example: echolocation.
Bats can detect differences in period as short as 10 nanoseconds. Neuronal spiking maxes out around 100 Hz. So the solution can’t just be as simple as “throw more energy and compute at it”. It’s a question of “design clever circuitry that’s as close as possible to theoretical limits on optimality”.
Similarly, the brain being very efficient increases the probability I assign to “it is doing something non-(practically-)isomorphic to feed-forward ANNs”. Maybe it’s hijacking recurrency in a way that scales far more effectively with parameter size than we can ever hope to create with transformers.
But I notice I am confused and will continue to think on it.