I disagree with the brain-based discussion of how much compute is required for AGI. Here’s an analogy I like (from here):
Left: Suppose that I want to model a translator (specifically, a MOSFET). And suppose that my model only needs to be sufficient to emulate the calculations done by a CMOS integrated circuit. Then my model can be extremely simple—it can just treat the transistor as a cartoon switch. (image source.)
Right: Again suppose that I want to model a transistor. But this time, I want my model to accurately capture all measurable details of the transistor. Then my model needs to be mind-bogglingly complex, involving dozens of adjustable parameters, some of which are shown in this table (screenshot from here).
What’s my point? I’m suggesting an analogy between this transistor and a neuron with synapses, dendritic spikes, etc. The latter system is mind-bogglingly complex when you study it in detail—no doubt about it! But that doesn’t mean that the neuron’s essential algorithmic role is equally complicated. The latter might just amount to a little cartoon diagram with some ANDs and ORs and IF-THENs or whatever. Or maybe not, but we should at least keep that possibility in mind.
--
For example, this paper is what I consider a plausible algorithmic role of dendritic spikes and synapses in cortical pyramidal neurons, and the upshot is “it’s basically just some ANDs and ORs”. If that’s right, this little bit of brain algorithm could presumably be implemented with <<1 FLOP per spike-through-synapse. I think that’s a suggestive data point, even if (as I strongly suspect) dendritic spikes and synapses are meanwhile doing other operations too.
--
Anyway, I currently think that, based on the brain, human-speed AGI is probably possible in 1e14 FLOP/s. (This post has a red-flag caveat on top, but that’s related to some issues in my discussion of memory, I stand by the compute section.) Not with current algorithms, I don’t think! But with some future algorithm.
I think that running microscopically-accurate brain simulations is many OOMs harder than running the algorithms that the brain is running. This is the same idea as the fact that running a microscopically-accurate simulation of a pocket calculator microcontroller chip, with all its thousands of transistors and capacitors and wires, stepping the simulation forward picosecond-by-picosecond, as the simulated chip multiplies two numbers, is many OOMs harder than multiplying two numbers.
Excellent points. Agree that the compute needed to simulate a thing is not equal to the compute performed by that thing. It’s very possible this means we’re overestimating the compute performed by the human brain a bit. Possible this is counterbalanced by early AGIs being inefficient, or having architectural constraints that the human brain lacks, but who knows. Very possible our 16% is too low, and should be higher. Tripling it to ~50% would yield a likelihood of transformative AGI of ~1.2%.
Dropping the required compute by, say, two OOMs, changes the estimates of how many fabs and how much power will be needed from “Massively more than expected from business as usual” to “Not far from business as usual” aka that 16% would need to be >>90% because by default the capacity would exist anyway. The same change would have the same kind effect on the “<$25/hr” assumption. At that scale, “just throw more compute at it” becomes a feasible enough solution that “learns slower than humans” stops seeming like a plausible problem, as well. I think you might be assuming you’ve made these estimates independently when they’re actually still being calculated based on common assumptions.
According to our rough and imperfect model, dropping inference needs by 2 OOMs increases our likelihood of hitting the $25/hr target by 20%abs, from 16% to 36%.
It doesn’t necessarily make a huge difference to chip and power scaling, as in our model those are dominated by our training estimates, not our inference need estimates. (Though of course those figures will be connected in reality.)
With no adjustment to chip and power scaling, this yields a 0.9% likelihood of TAGI.
With a +15%abs bump to chip and power scaling, this yields a 1.2% likelihood of TAGI.
Ah, sorry, I see I made an important typo in my comment, that 16% value I mentioned was supposed to be 46%, because it was in reference to the chip fabs & power requirements estimate.
The rest of the comment after that was my way of saying “the fact that these dependences on common assumptions between the different conditional probabilities exist at all mean you can’t really claim that you can multiply them all together and consider the result meaningful in the way described here.”
I say that because the dependencies mean you can’t productively discuss disagreements about any of your assumptions that go into your estimates, without adjusting all the probabilities in the model. A single updated assumption/estimate breaks the claim of conditional independence that lets you multiply the probabilities.
For example, in a world that actually had “algorithms for transformative AGI” that were just too expensive to productively used, what would happen next? Well, my assumption is that a lot more companies would hire a lot more humans to get to work on making them more efficient, using the best available less-transformative tools. A lot of governments would invest trillions in building the fabs and power plants and mines to build it anyway, even if it still cost $25,000/human-equivalent-hr. They’d then turn the AGI loose on the problem of improving its own efficiency. And on making better robots. And on using those robots to make more robots and build more power plants and mine more materials. Once producing more inputs is automated, supply stops being limited by human labor, and doesn’t require more high level AI inference either. Cost of inputs into increasing AI capabilities becomes decoupled from the human economy, so that the price of electricity and compute in dollars plummets. This is one of many hypothetical pathways where a single disagreement renders consideration of the subsequent numbers moot. Presenting the final output as a single number hides the extreme sensitivity of that number to changes in key underlying assumptions.
I disagree with the brain-based discussion of how much compute is required for AGI. Here’s an analogy I like (from here):
Left: Suppose that I want to model a translator (specifically, a MOSFET). And suppose that my model only needs to be sufficient to emulate the calculations done by a CMOS integrated circuit. Then my model can be extremely simple—it can just treat the transistor as a cartoon switch. (image source.)
Right: Again suppose that I want to model a transistor. But this time, I want my model to accurately capture all measurable details of the transistor. Then my model needs to be mind-bogglingly complex, involving dozens of adjustable parameters, some of which are shown in this table (screenshot from here).
What’s my point? I’m suggesting an analogy between this transistor and a neuron with synapses, dendritic spikes, etc. The latter system is mind-bogglingly complex when you study it in detail—no doubt about it! But that doesn’t mean that the neuron’s essential algorithmic role is equally complicated. The latter might just amount to a little cartoon diagram with some ANDs and ORs and IF-THENs or whatever. Or maybe not, but we should at least keep that possibility in mind.
--
For example, this paper is what I consider a plausible algorithmic role of dendritic spikes and synapses in cortical pyramidal neurons, and the upshot is “it’s basically just some ANDs and ORs”. If that’s right, this little bit of brain algorithm could presumably be implemented with <<1 FLOP per spike-through-synapse. I think that’s a suggestive data point, even if (as I strongly suspect) dendritic spikes and synapses are meanwhile doing other operations too.
--
Anyway, I currently think that, based on the brain, human-speed AGI is probably possible in 1e14 FLOP/s. (This post has a red-flag caveat on top, but that’s related to some issues in my discussion of memory, I stand by the compute section.) Not with current algorithms, I don’t think! But with some future algorithm.
I think that running microscopically-accurate brain simulations is many OOMs harder than running the algorithms that the brain is running. This is the same idea as the fact that running a microscopically-accurate simulation of a pocket calculator microcontroller chip, with all its thousands of transistors and capacitors and wires, stepping the simulation forward picosecond-by-picosecond, as the simulated chip multiplies two numbers, is many OOMs harder than multiplying two numbers.
Excellent points. Agree that the compute needed to simulate a thing is not equal to the compute performed by that thing. It’s very possible this means we’re overestimating the compute performed by the human brain a bit. Possible this is counterbalanced by early AGIs being inefficient, or having architectural constraints that the human brain lacks, but who knows. Very possible our 16% is too low, and should be higher. Tripling it to ~50% would yield a likelihood of transformative AGI of ~1.2%.
Specifically, by 6-8 OOMs. I don’t think that’s “a bit.” ;)
Dropping the required compute by, say, two OOMs, changes the estimates of how many fabs and how much power will be needed from “Massively more than expected from business as usual” to “Not far from business as usual” aka that 16% would need to be >>90% because by default the capacity would exist anyway. The same change would have the same kind effect on the “<$25/hr” assumption. At that scale, “just throw more compute at it” becomes a feasible enough solution that “learns slower than humans” stops seeming like a plausible problem, as well. I think you might be assuming you’ve made these estimates independently when they’re actually still being calculated based on common assumptions.
According to our rough and imperfect model, dropping inference needs by 2 OOMs increases our likelihood of hitting the $25/hr target by 20%abs, from 16% to 36%.
It doesn’t necessarily make a huge difference to chip and power scaling, as in our model those are dominated by our training estimates, not our inference need estimates. (Though of course those figures will be connected in reality.)
With no adjustment to chip and power scaling, this yields a 0.9% likelihood of TAGI.
With a +15%abs bump to chip and power scaling, this yields a 1.2% likelihood of TAGI.
Ah, sorry, I see I made an important typo in my comment, that 16% value I mentioned was supposed to be 46%, because it was in reference to the chip fabs & power requirements estimate.
The rest of the comment after that was my way of saying “the fact that these dependences on common assumptions between the different conditional probabilities exist at all mean you can’t really claim that you can multiply them all together and consider the result meaningful in the way described here.”
I say that because the dependencies mean you can’t productively discuss disagreements about any of your assumptions that go into your estimates, without adjusting all the probabilities in the model. A single updated assumption/estimate breaks the claim of conditional independence that lets you multiply the probabilities.
For example, in a world that actually had “algorithms for transformative AGI” that were just too expensive to productively used, what would happen next? Well, my assumption is that a lot more companies would hire a lot more humans to get to work on making them more efficient, using the best available less-transformative tools. A lot of governments would invest trillions in building the fabs and power plants and mines to build it anyway, even if it still cost $25,000/human-equivalent-hr. They’d then turn the AGI loose on the problem of improving its own efficiency. And on making better robots. And on using those robots to make more robots and build more power plants and mine more materials. Once producing more inputs is automated, supply stops being limited by human labor, and doesn’t require more high level AI inference either. Cost of inputs into increasing AI capabilities becomes decoupled from the human economy, so that the price of electricity and compute in dollars plummets. This is one of many hypothetical pathways where a single disagreement renders consideration of the subsequent numbers moot. Presenting the final output as a single number hides the extreme sensitivity of that number to changes in key underlying assumptions.