I think a better counterargument is that if a computer running a human-brain-like algorithm consumes a whopping 10,000× more power than does a human brain, who cares? The electricity costs would still be below my local minimum wage!
I agree (as counterargument to skepticism)! Right now though, “brains being much more efficient than computers” would update me towards “AGI is further away / more theoretical breakthroughs are needed”. Would love to hear counterarguments to this model.
I argue here that a much better analogy is between training an ML model versus within-lifetime learning, i.e. multiply Joe Carlsmith’s FLOP/s estimates by roughly 1 billion seconds (≈31 years, or pick a different length of time as you wish) to get training FLOP. See the “Genome = ML code” analogy table in that post.
Great point. Copying from my response to Slider: “Cotra’s lifetime anchor is 1029 FLOPs (so 4-5 OOMs above gradient descent). That’s still quite a chasm.”
I didn’t check just now, but I vaguely recall that there’s several (maybe 3??)-orders-of-magnitude difference between FLOP/J of a supercomputer versus FLOP/J of a GPU.
This paper suggests 100 GFLOPs/W in 2020 (within an OOM of Fuguka). I don’t know how much progress there’s been in the last two years.
I think that’s a bad tradeoff. FLOP reads just fine. Clear communication is more important!! :)
Right now though, “brains being much more efficient than computers” would update me towards “AGI is further away / more theoretical breakthroughs are needed”. Would love to hear counterarguments to this model.
I don’t understand why you would update that way, so I’m not sure how to counterargue.
For example, suppose that tomorrow Intel announced that they’ve had a breakthrough in carbon nanotube transistors or whatever, and therefore future generations of chips will be 10× more energy efficient per FLOP. If I understand correctly, on your model, you would see that announcement and say “Oh wow, carbon nanotube transistors, I guess now I should update to AGI is closer / fewer theoretical breakthroughs are needed.” Whereas on my model, that announcement is interesting but has a very indirect impact on what I should think and expect about AGI. Can you say more about where you’re coming from there?
This paper suggests 100 GFLOPs/W in 2020 (within an OOM of Fuguka). I don’t know how much progress there’s been in the last two years.
A100 datasheet says 624TFLOP/s/(250W) = 10−12.4J/FLOP. So ≈1 OOM lower than the supercomputer Fugaku. Good to know!
Let me take a slightly different example: echolocation.
Bats can detect differences in period as short as 10 nanoseconds. Neuronal spiking maxes out around 100 Hz. So the solution can’t just be as simple as “throw more energy and compute at it”. It’s a question of “design clever circuitry that’s as close as possible to theoretical limits on optimality”.
Similarly, the brain being very efficient increases the probability I assign to “it is doing something non-(practically-)isomorphic to feed-forward ANNs”. Maybe it’s hijacking recurrency in a way that scales far more effectively with parameter size than we can ever hope to create with transformers.
But I notice I am confused and will continue to think on it.
I agree (as counterargument to skepticism)! Right now though, “brains being much more efficient than computers” would update me towards “AGI is further away / more theoretical breakthroughs are needed”. Would love to hear counterarguments to this model.
Great point. Copying from my response to Slider: “Cotra’s lifetime anchor is 1029 FLOPs (so 4-5 OOMs above gradient descent). That’s still quite a chasm.”
This paper suggests 100 GFLOPs/W in 2020 (within an OOM of Fuguka). I don’t know how much progress there’s been in the last two years.
Good point! I’ve updated the text.
I don’t understand why you would update that way, so I’m not sure how to counterargue.
For example, suppose that tomorrow Intel announced that they’ve had a breakthrough in carbon nanotube transistors or whatever, and therefore future generations of chips will be 10× more energy efficient per FLOP. If I understand correctly, on your model, you would see that announcement and say “Oh wow, carbon nanotube transistors, I guess now I should update to AGI is closer / fewer theoretical breakthroughs are needed.” Whereas on my model, that announcement is interesting but has a very indirect impact on what I should think and expect about AGI. Can you say more about where you’re coming from there?
A100 datasheet says 624TFLOP/s/(250W) = 10−12.4J/FLOP. So ≈1 OOM lower than the supercomputer Fugaku. Good to know!
Let me take a slightly different example: echolocation.
Bats can detect differences in period as short as 10 nanoseconds. Neuronal spiking maxes out around 100 Hz. So the solution can’t just be as simple as “throw more energy and compute at it”. It’s a question of “design clever circuitry that’s as close as possible to theoretical limits on optimality”.
Similarly, the brain being very efficient increases the probability I assign to “it is doing something non-(practically-)isomorphic to feed-forward ANNs”. Maybe it’s hijacking recurrency in a way that scales far more effectively with parameter size than we can ever hope to create with transformers.
But I notice I am confused and will continue to think on it.