Recapitulating the response of Steven Byrnes to this argument: it may be very expensive computationally to simulate a computer in a faithful way, but that doesn’t mean it’s expensive to do the same computation that the computer in question is doing. Paraphrasing a nice quote from Richard Borcherds, it may be that teapots are very hard to simulate on a classical computer, but that doesn’t mean that they are useful computational devices.
If we tried to simulate a GPU doing a simple matrix multiplication at high physical fidelity, we would have to take so many factors into account that the cost of our simulation would far exceed the cost of running the GPU itself. Similarly, if we tried to program a physically realistic simulation of the human brain, I have no doubt that the computational cost of doing so would be enormous.
However, this is not what we’re interested in doing. We’re interested in creating a computer that’s doing the same kind of computation as the brain, and the amount of useful computation that the brain could be doing per second is much less than 1e25 or even 1e20 FLOP/s. If your point is that 1e25 FLOP/s is an upper bound on how much computation the brain is doing, I agree, but there’s no reason to think it’s a tight upper bound.
It also implies the brain is doing its job much more efficiently than we know how to use an A100 to do it, but I’m not sure why that should be particularly surprising.
This claim is different from the claim that the brain is doing 1e20 FLOP/s of useful computation, which is the claim that the authors actually make. If you have an object that implements some efficient algorithm that you don’t understand, the object can be doing little useful computation even though you would need much greater amounts of computation to match its performance with a worse algorithm. The estimates coming from the brain are important because they give us a sense of how much software efficiency progress ought to be possible here.
My argument from the Landauer limit is about the number of bit erasures and doesn’t depend on the software being implemented by the brain vs. a GPU. If the brain is doing something that’s in some sense equivalent to 1e20 floating point operations per second, based on its power consumption that would imply that it’s operating basically at the Landauer limit, perhaps only one order of magnitude off. Just the huge amount of noise in brain interconnect should be enough to discredit this estimate. Whether the brain is specialized to perform one or another kind of task is not relevant for this calculation.
Perhaps you think the brain has massive architectural or algorithmic advantages over contemporary neural networks, but if you do, that is a position that has to be defended on very different grounds than “it would take X amount of FLOP/s to simulate one neuron at a high physical fidelity”.
If we tried to simulate a GPU doing a simple matrix multiplication at high physical fidelity, we would have to take so many factors into account that the cost of our simulation would far exceed the cost of running the GPU itself. Similarly, if we tried to program a physically realistic simulation of the human brain, I have no doubt that the computational cost of doing so would be enormous.
The Beniaguev paper does not attempt to simulate neurons at high physical fidelity. It merely attempts to simulate their outputs, which is a far simpler task. I am in total agreement with you that the computation needed to simulate a system is entirely distinct from the computation being performed by that system. Simulating a human brain would require vastly more than 1e21 FLOPS.
This claim is different from the claim that the brain is doing 1e20 FLOP/s of useful computation, which is the claim that the authors actually make.
Is it? I suppose they don’t say so explicitly, but it sounds like they’re using “2020-equivalent” FLOPs (or whatever it is Cotra and Carlsmith use), which has room for “algorithmic progress” baked in.
Perhaps you think the brain has massive architectural or algorithmic advantages over contemporary neural networks, but if you do, that is a position that has to be defended on very different grounds than “it would take X amount of FLOP/s to simulate one neuron at a high physical fidelity”.
I may be reading the essay wrong, but I think this is the claim being made and defended. “Simulating” a neuron at any level of physical detail is going to be irrelevantly difficult, and indeed in Beniaguev et al., running a DNN on a GPU that implements the computation a neuron is doing (four binary inputs, one output) is a 2000X speedup over solving PDEs (a combination of compression and hardware/software). They find it difficult to make the neural network smaller or shorter-memory, suggesting it’s hard to implement the same computation more efficiently with current methods.
I think you’re just reading the essay wrong. In the “executive summary” section, they explicitly state that
Our best anchor for how much compute an AGI needs is the human brain, which we estimate to perform 1e20–1e21 FLOPS.
and
In addition, we estimate that today’s computer hardware is ~5 orders of magnitude less cost efficient and energy efficient than brains.
I don’t know how you read those claims and arrived at your interpretation, and indeed I don’t know how the evidence they provide could support the interpretation you’re talking about. It would also be a strange omission to not mention the “effective” part of “effective FLOP” explicitly if that’s actually what you’re talking about.
Thanks, I see. I agree that a lot of confusion could be avoided with clearer language, but I think at least that they’re not making as simple an error as you describe in the root comment. Ted does say in the EA Forum thread that they don’t believe brains operate at the Landauer limit, but I’ll let him chime in here if he likes.
I think the “effective FLOP” concept is very muddy, but I’m even less sure what it would mean to alternatively describe what the brain is doing in “absolute” FLOPs. Meanwhile, the model they’re using gives a relatively well-defined equivalence between the logical function of the neuron and modern methods on a modern GPU.
The statement about cost and energy efficiency as they elaborate in the essay body is about getting human-equivalent task performance relative to paying a human worker $25/hour, not saying that the brain uses five orders of magnitude less energy per FLOP of any kind. Closing that gap of five orders of magnitude could come either from doing less computation than the logical-equivalent-neural-network or from decreasing the cost of computation.
Recapitulating the response of Steven Byrnes to this argument: it may be very expensive computationally to simulate a computer in a faithful way, but that doesn’t mean it’s expensive to do the same computation that the computer in question is doing. Paraphrasing a nice quote from Richard Borcherds, it may be that teapots are very hard to simulate on a classical computer, but that doesn’t mean that they are useful computational devices.
If we tried to simulate a GPU doing a simple matrix multiplication at high physical fidelity, we would have to take so many factors into account that the cost of our simulation would far exceed the cost of running the GPU itself. Similarly, if we tried to program a physically realistic simulation of the human brain, I have no doubt that the computational cost of doing so would be enormous.
However, this is not what we’re interested in doing. We’re interested in creating a computer that’s doing the same kind of computation as the brain, and the amount of useful computation that the brain could be doing per second is much less than 1e25 or even 1e20 FLOP/s. If your point is that 1e25 FLOP/s is an upper bound on how much computation the brain is doing, I agree, but there’s no reason to think it’s a tight upper bound.
This claim is different from the claim that the brain is doing 1e20 FLOP/s of useful computation, which is the claim that the authors actually make. If you have an object that implements some efficient algorithm that you don’t understand, the object can be doing little useful computation even though you would need much greater amounts of computation to match its performance with a worse algorithm. The estimates coming from the brain are important because they give us a sense of how much software efficiency progress ought to be possible here.
My argument from the Landauer limit is about the number of bit erasures and doesn’t depend on the software being implemented by the brain vs. a GPU. If the brain is doing something that’s in some sense equivalent to 1e20 floating point operations per second, based on its power consumption that would imply that it’s operating basically at the Landauer limit, perhaps only one order of magnitude off. Just the huge amount of noise in brain interconnect should be enough to discredit this estimate. Whether the brain is specialized to perform one or another kind of task is not relevant for this calculation.
Perhaps you think the brain has massive architectural or algorithmic advantages over contemporary neural networks, but if you do, that is a position that has to be defended on very different grounds than “it would take X amount of FLOP/s to simulate one neuron at a high physical fidelity”.
The Beniaguev paper does not attempt to simulate neurons at high physical fidelity. It merely attempts to simulate their outputs, which is a far simpler task. I am in total agreement with you that the computation needed to simulate a system is entirely distinct from the computation being performed by that system. Simulating a human brain would require vastly more than 1e21 FLOPS.
Is it? I suppose they don’t say so explicitly, but it sounds like they’re using “2020-equivalent” FLOPs (or whatever it is Cotra and Carlsmith use), which has room for “algorithmic progress” baked in.
I may be reading the essay wrong, but I think this is the claim being made and defended. “Simulating” a neuron at any level of physical detail is going to be irrelevantly difficult, and indeed in Beniaguev et al., running a DNN on a GPU that implements the computation a neuron is doing (four binary inputs, one output) is a 2000X speedup over solving PDEs (a combination of compression and hardware/software). They find it difficult to make the neural network smaller or shorter-memory, suggesting it’s hard to implement the same computation more efficiently with current methods.
I think you’re just reading the essay wrong. In the “executive summary” section, they explicitly state that
and
I don’t know how you read those claims and arrived at your interpretation, and indeed I don’t know how the evidence they provide could support the interpretation you’re talking about. It would also be a strange omission to not mention the “effective” part of “effective FLOP” explicitly if that’s actually what you’re talking about.
Thanks, I see. I agree that a lot of confusion could be avoided with clearer language, but I think at least that they’re not making as simple an error as you describe in the root comment. Ted does say in the EA Forum thread that they don’t believe brains operate at the Landauer limit, but I’ll let him chime in here if he likes.
I think the “effective FLOP” concept is very muddy, but I’m even less sure what it would mean to alternatively describe what the brain is doing in “absolute” FLOPs. Meanwhile, the model they’re using gives a relatively well-defined equivalence between the logical function of the neuron and modern methods on a modern GPU.
The statement about cost and energy efficiency as they elaborate in the essay body is about getting human-equivalent task performance relative to paying a human worker $25/hour, not saying that the brain uses five orders of magnitude less energy per FLOP of any kind. Closing that gap of five orders of magnitude could come either from doing less computation than the logical-equivalent-neural-network or from decreasing the cost of computation.