Of course if we talk longer term, the brain is obviously evidence that one human-brain power can be achieved in about 10 watts, so the 1GW power plant could support a population of 100 million uploads or neuromorphic AGIs. That’s very much part of my model (and hansons, and moravecs) - eventually.
This seems like it straightforwardly agrees that energy efficiency is not in any way a bottleneck, so I don’t understand the focus of this post on efficiency.
I also don’t know what you mean by longer term. More room at the bottom was of course also talking longer term (you can’t build new hardware in a few weeks, unless you have nanotech, but then you can also build new factories in a few weeks), so I don’t understand why you are suddenly talking as if “longer term” was some kind of shift of the topic.
Eliezer’s model is that we definitely won’t have many decades with AIs smarter but not much smarter than humans, since there appear to be many ways to scale up intelligence, both via algorithmic progress and via hardware progress. Eliezer thinks that drexlerian nanotech is one of the main ways to do this, and if you buy that premise, then the efficiency arguments don’t really matter, since clearly you can just scale things up horizontally and build a bunch of GPUs. But even if you don’t, you can still just scale things up horizontally and increase GPU production (and in any case, energy efficiency is not the bottleneck here, it’s GPU production, which this post doesn’t talk about)
A quick google search indicates 1GW is a typical power plant output, which in theory could power roughly a million GPU datacenter. This is almost 100 times larger in power consumption than the current largest official supercomputer: Frontier—which has about 30k GPUs. The supercomputer used to train GPT4 is somewhat of a secret, but estimated to be about that size. So at 50x to 100x you are talking about scaling up to something approaching a hypothetical GPT-5 scale cluster.
Nvidia currently produces less than 100k high end enterprise GPUs per year in total, so you can’t even produce this datacenter unless Nvidia grows by about 10x and TSMC grows by perhaps 2x.
The datacenter would likely cost over a hundred billion dollars, and the resulting models would be proportionally more expensive to run, such that it’s unclear whether this would be a win (at least using current tech). Sure I do think there is some room for software improvement.
I don’t understand the relevance of this. You seem to be now talking about a completely different scenario than what I understood Eliezer to be talking about. Eliezer does not think that a slightly superhuman AI would be capable of improving the hardware efficiency of its hardware completely on its own.
Both scenarios (going both big, in that you just use whole power-plant levels of energy, or going down in that you improve efficiency of chips) require changing semiconductor manufacturing, which is unlikely to be one of the first things a nascent AI does, unless it does successfully develop and deploy drexlerian nanotech. Eliezer in his model here was talking about what are reasonable limits that we would be approaching here relatively soon after an AI passes human levels.
Remember this post is all about critiquing EY’s specific doom model which involves fast foom on current hardware through recursive self-improvement.
I don’t understand the relevance of thermodynamic efficiency to a foom scenario “on current hardware”. You are not going to change the thermodynamic efficiency of the hardware you are literally running on, you have to build new hardware for that either way.
To reiterate the model of EY that I am critiquing is one where an AGI quickly rapidly fooms through many OOM efficiency improvements. All key required improvements are efficiency improvements—it needs to improve it’s world modelling/planning per unit compute, and or improve compute per dollar and or compute per joule, etc.
In EY’s model there are some perhaps many OOM software improvements over the initial NN arch/aglorithms, perhaps then continued with more OOM hardware improvements. I don’t believe “buying more GPUs” is a key part of his model—it is far far too slow to provide even one OOM upgrade. Renting/hacking your way to even one OOM more GPUs is also largely unrealistic (I run one of the larger GPU compute markets and talk to many suppliers, I have inside knowledge here).
Both scenarios (going both big, in that you just use whole power-plant levels of energy, or going down in that you improve efficiency of chips) require changing semiconductor manufacturing, which is unlikely to be one of the first things a nascent AI does, unless it does successfully develop and deploy drexlerian nanotech
Right, so I have arguments against drexlerian nanotech (Moore room at the bottom, but also the thermodynamic constraints indicating you just can’t get many from nanotech alone), and separate arguments against many OOM from software (mind software efficiency).
I don’t understand the relevance of thermodynamic efficiency to a foom scenario “on current hardware”.
It is mostly relevant to the drexlerian nanotech, as it shows there likely isn’t much improvement over GPUs for all the enormous effort. If nanotech were feasible and could easily allow computers 6 OOM more efficient than the brain using about the same energy/space/materials, then I would more agree with his argument.
This seems like it straightforwardly agrees that energy efficiency is not in any way a bottleneck, so I don’t understand the focus of this post on efficiency.
I also don’t know what you mean by longer term. More room at the bottom was of course also talking longer term (you can’t build new hardware in a few weeks, unless you have nanotech, but then you can also build new factories in a few weeks), so I don’t understand why you are suddenly talking as if “longer term” was some kind of shift of the topic.
Eliezer’s model is that we definitely won’t have many decades with AIs smarter but not much smarter than humans, since there appear to be many ways to scale up intelligence, both via algorithmic progress and via hardware progress. Eliezer thinks that drexlerian nanotech is one of the main ways to do this, and if you buy that premise, then the efficiency arguments don’t really matter, since clearly you can just scale things up horizontally and build a bunch of GPUs. But even if you don’t, you can still just scale things up horizontally and increase GPU production (and in any case, energy efficiency is not the bottleneck here, it’s GPU production, which this post doesn’t talk about)
I don’t understand the relevance of this. You seem to be now talking about a completely different scenario than what I understood Eliezer to be talking about. Eliezer does not think that a slightly superhuman AI would be capable of improving the hardware efficiency of its hardware completely on its own.
Both scenarios (going both big, in that you just use whole power-plant levels of energy, or going down in that you improve efficiency of chips) require changing semiconductor manufacturing, which is unlikely to be one of the first things a nascent AI does, unless it does successfully develop and deploy drexlerian nanotech. Eliezer in his model here was talking about what are reasonable limits that we would be approaching here relatively soon after an AI passes human levels.
I don’t understand the relevance of thermodynamic efficiency to a foom scenario “on current hardware”. You are not going to change the thermodynamic efficiency of the hardware you are literally running on, you have to build new hardware for that either way.
To reiterate the model of EY that I am critiquing is one where an AGI quickly rapidly fooms through many OOM efficiency improvements. All key required improvements are efficiency improvements—it needs to improve it’s world modelling/planning per unit compute, and or improve compute per dollar and or compute per joule, etc.
In EY’s model there are some perhaps many OOM software improvements over the initial NN arch/aglorithms, perhaps then continued with more OOM hardware improvements. I don’t believe “buying more GPUs” is a key part of his model—it is far far too slow to provide even one OOM upgrade. Renting/hacking your way to even one OOM more GPUs is also largely unrealistic (I run one of the larger GPU compute markets and talk to many suppliers, I have inside knowledge here).
Right, so I have arguments against drexlerian nanotech (Moore room at the bottom, but also the thermodynamic constraints indicating you just can’t get many from nanotech alone), and separate arguments against many OOM from software (mind software efficiency).
It is mostly relevant to the drexlerian nanotech, as it shows there likely isn’t much improvement over GPUs for all the enormous effort. If nanotech were feasible and could easily allow computers 6 OOM more efficient than the brain using about the same energy/space/materials, then I would more agree with his argument.