I could make the exact same argument about some grad student’s first DL experiment running on a GPU, on multiple levels.
I also suspect you could get rid of many neurons in their DL model without changing the computation, I suspect they aren’t working together to do any kind of larger cognition in anywhere closer to the most efficient possible way.
It’s also likely they may not even know how to use the tensorcores efficiently, and even if they did the tensorcores waste most of their compute multiplying by zeros or near zeroes, regardless of how skilled/knowledge-able the DL practitioner.
And yet knowing all this, we still count flops in the obvious way, as counting “hypothetical fully utilized fllops” is not an easy useful quantity to measure discuss and compare.
Utilization of the compute resources is a higher level software/architecture efficiency consideration, not a hardware efficiency measure.
And yet knowing all this, we still count flops in the obvious way, as counting “hypothetical fully utilized fllops” is not an easy useful quantity to measure discuss and compare.
Given a CPU capable of a specified number of FLOPs at a specified precision, I actually can take arbitrary floats at that precision and multiply or add them in arbitrary ways at the specified rate[1].
Not so for brains, for at least a couple of reasons:
An individual neuron can’t necessarily perform an arbitrary multiply / add / accumulate operation, at any particular precision. It may be modeled by an analog MAC of a specified precision over some input range.
The software / architecture point above. For many artificial computations we care about, we can apply both micro (e.g. assembly code optimization) and macro (e.g. using a non-quadratic algorithm for matrix multiplication) optimization to get pretty close to the theoretical limit of efficiency. Maybe the brain is already doing the analog version of these kinds optimizations in some cases. Yes, this is somewhat of a separate / higher-level consideration, but if neurons are less repurposable and rearrangeable than transistors, it’s another reason why the FLOPs to SYNOPs comparison is not appples-to-apples.
I actually can take arbitrary floats at that precision and multiply or add them in arbitrary ways at the specified rate[1].
And? DL systems just use those floats to simulate large NNs, and a good chunk of recent progress has resulted from moving down to lower precision from 32b to 16b to 8b and soon 4b or lower, chasing after the brain’s carefully tuned use of highly energy efficient low precision ops.
Intelligence requires exploring a circuit space, simulating circuits. The brain is exactly the kind of hardware you need to do that with extreme efficiency given various practical physical constraints.
GPUs/accelerators can match the brain in raw low precision op/s useful for simulating NNs (circuits), but use far more energy to do so and more importantly are also extremely limited by memory bandwidth which results in an extremely poor 100:1 or even 1000:1 alu:mem ratio, which prevents them from accelerating anything other than matrix matrix multiplication, rather than the far more useful sparse vector matrix multiplication.
Yes, this is somewhat of a separate / higher-level consideration, but if neurons are less repurposable and rearrangeable than transistors,
This is just nonsense. A GPU can not rearrange its internal circuitry to change precision or reallocate operations. A brain can and does by shrinking/expanding synapses, growing new ones, etc.
This is just nonsense. A GPU can not rearrange its internal circuitry to change precision or reallocate operations. A brain can and does by shrinking/expanding synapses, growing new ones, etc.
Give me some floats, I can make a GPU do matrix multiplication, or sparse matrix multiplication, or many other kind of computations at a variety of precisions across the entire domain of floats at that precision.
A brain is (maybe) carrying out a computation which is modeled by a particular bunch of sparse matrix multiplications, in which the programmer has much less control over the inputs, domain, and structure of the computation.
The fact that some process (maybe) irreducibly requires some number of FLOPs to simulate faithfully is different from that process being isomorphic to that computation itself.
Intelligence requires exploring and simulating a large circuit space—ie by using something like gradient descent on neural networks. You can use a GPU to do that inefficiently or you can create custom nanotech analog hardware like the brain.
The brain emulates circuits, and current AI systems on GPUs simulate circuits inspired by the brain’s emulation.
Intelligence requires exploring and simulating a large circuit space—ie by using something like gradient descent on neural networks.
I don’t think neuroplasticity is equivalent to architecting and then doing gradient descent on an artificial neural network. That process is more analogous to billions of years of evolution, which encoded most of the “circuit exploration” process in DNA. In the brain, some of the weights and even connections are adjusted at “runtime”, but the rules for making those connections are necessarily encoded in DNA.
(Also, I flatly don’t buy that any of this is required for intelligence.)
I could make the exact same argument about some grad student’s first DL experiment running on a GPU, on multiple levels.
I also suspect you could get rid of many neurons in their DL model without changing the computation, I suspect they aren’t working together to do any kind of larger cognition in anywhere closer to the most efficient possible way.
It’s also likely they may not even know how to use the tensorcores efficiently, and even if they did the tensorcores waste most of their compute multiplying by zeros or near zeroes, regardless of how skilled/knowledge-able the DL practitioner.
And yet knowing all this, we still count flops in the obvious way, as counting “hypothetical fully utilized fllops” is not an easy useful quantity to measure discuss and compare.
Utilization of the compute resources is a higher level software/architecture efficiency consideration, not a hardware efficiency measure.
Given a CPU capable of a specified number of FLOPs at a specified precision, I actually can take arbitrary floats at that precision and multiply or add them in arbitrary ways at the specified rate[1].
Not so for brains, for at least a couple of reasons:
An individual neuron can’t necessarily perform an arbitrary multiply / add / accumulate operation, at any particular precision. It may be modeled by an analog MAC of a specified precision over some input range.
The software / architecture point above. For many artificial computations we care about, we can apply both micro (e.g. assembly code optimization) and macro (e.g. using a non-quadratic algorithm for matrix multiplication) optimization to get pretty close to the theoretical limit of efficiency. Maybe the brain is already doing the analog version of these kinds optimizations in some cases. Yes, this is somewhat of a separate / higher-level consideration, but if neurons are less repurposable and rearrangeable than transistors, it’s another reason why the FLOPs to SYNOPs comparison is not appples-to-apples.
modulo some concerns about I/O, generation, checking, and CPU manufacturers inflating their benchmark numbers
And? DL systems just use those floats to simulate large NNs, and a good chunk of recent progress has resulted from moving down to lower precision from 32b to 16b to 8b and soon 4b or lower, chasing after the brain’s carefully tuned use of highly energy efficient low precision ops.
Intelligence requires exploring a circuit space, simulating circuits. The brain is exactly the kind of hardware you need to do that with extreme efficiency given various practical physical constraints.
GPUs/accelerators can match the brain in raw low precision op/s useful for simulating NNs (circuits), but use far more energy to do so and more importantly are also extremely limited by memory bandwidth which results in an extremely poor 100:1 or even 1000:1 alu:mem ratio, which prevents them from accelerating anything other than matrix matrix multiplication, rather than the far more useful sparse vector matrix multiplication.
This is just nonsense. A GPU can not rearrange its internal circuitry to change precision or reallocate operations. A brain can and does by shrinking/expanding synapses, growing new ones, etc.
Give me some floats, I can make a GPU do matrix multiplication, or sparse matrix multiplication, or many other kind of computations at a variety of precisions across the entire domain of floats at that precision.
A brain is (maybe) carrying out a computation which is modeled by a particular bunch of sparse matrix multiplications, in which the programmer has much less control over the inputs, domain, and structure of the computation.
The fact that some process (maybe) irreducibly requires some number of FLOPs to simulate faithfully is different from that process being isomorphic to that computation itself.
Intelligence requires exploring and simulating a large circuit space—ie by using something like gradient descent on neural networks. You can use a GPU to do that inefficiently or you can create custom nanotech analog hardware like the brain.
The brain emulates circuits, and current AI systems on GPUs simulate circuits inspired by the brain’s emulation.
I don’t think neuroplasticity is equivalent to architecting and then doing gradient descent on an artificial neural network. That process is more analogous to billions of years of evolution, which encoded most of the “circuit exploration” process in DNA. In the brain, some of the weights and even connections are adjusted at “runtime”, but the rules for making those connections are necessarily encoded in DNA.
(Also, I flatly don’t buy that any of this is required for intelligence.)