I’m not sure what you mean precisely by “precisely meaningful”, but I do believe we actually know enough about how neural circuits and synapses work[1] such that we have some confidence that they must be doing something similar to their artificial analogs in DL systems.
So this minimally requires:
storage for a K-bit connection weight in memory
(some synapses) nonlinear decoding of B-bit incoming neural spike signal (timing based)
analog ‘multiplication’[2] of incoming B-bit neural signal by K-bit weight
weight update from local backpropagating hebbian/gradient signal or equivalent
We know from DL that K and B do not need to be very large, but the optimal are well above 1-bit, and more importantly the long term weight storage (equivalent of gradient EMA/momentum) drives most of the precision demand, as it needs to accumulate many noisy measurements over time. From DL it looks like you want around 8-bit at least for long-term weight param storage, even if you can sample down to 4-bit or a bit lower for forward/backwards passes.
So that just takes a certain amount of work, and if you map out the minimal digital circuits in a maximally efficient hypothetical single-electron tile technology you really do get something on order 1e5 minimal 1eV units or more[3]. Synapses are also efficient in the sense that they grow/shrink to physically represent larger/smaller logical weights using more/less resources in the optimal fashion.
I have also argued on the other side of this—there are some DL researchers who think the brain does many many OOM more computation than it would seem, but we can rule that out with the same analysis.
The actual synaptic operations are non-linear and more complex, but do something like the equivalent work of analog multiplication, and can’t be doing dramatically more or less.
Thanks! (I’m having a hard time following your argument as a whole, and I’m also not trying very hard / being lazy / not checking the numbers; but I appreciate your answers, and they’re at least fleshing out some kind of model that feels useful to me. )
I’m not sure what you mean precisely by “precisely meaningful”, but I do believe we actually know enough about how neural circuits and synapses work[1] such that we have some confidence that they must be doing something similar to their artificial analogs in DL systems.
So this minimally requires:
storage for a K-bit connection weight in memory
(some synapses) nonlinear decoding of B-bit incoming neural spike signal (timing based)
analog ‘multiplication’[2] of incoming B-bit neural signal by K-bit weight
weight update from local backpropagating hebbian/gradient signal or equivalent
We know from DL that K and B do not need to be very large, but the optimal are well above 1-bit, and more importantly the long term weight storage (equivalent of gradient EMA/momentum) drives most of the precision demand, as it needs to accumulate many noisy measurements over time. From DL it looks like you want around 8-bit at least for long-term weight param storage, even if you can sample down to 4-bit or a bit lower for forward/backwards passes.
So that just takes a certain amount of work, and if you map out the minimal digital circuits in a maximally efficient hypothetical single-electron tile technology you really do get something on order 1e5 minimal 1eV units or more[3]. Synapses are also efficient in the sense that they grow/shrink to physically represent larger/smaller logical weights using more/less resources in the optimal fashion.
I have also argued on the other side of this—there are some DL researchers who think the brain does many many OOM more computation than it would seem, but we can rule that out with the same analysis.
To those with the relevant background knowledge in DL, accelerator designs, and the relevant neuroscience.
The actual synaptic operations are non-linear and more complex, but do something like the equivalent work of analog multiplication, and can’t be doing dramatically more or less.
This is not easy to do either and requires knowledge of the limits of electronics.
Thanks! (I’m having a hard time following your argument as a whole, and I’m also not trying very hard / being lazy / not checking the numbers; but I appreciate your answers, and they’re at least fleshing out some kind of model that feels useful to me. )