I’m pretty sure what he means by short timelines giving less compute overhang is this: if we were to somehow delay working on AGI for, say, ten years, we’d have such an improvement in compute that it could probably run on a small cluster or even a laptop. The implied claim here is that current generations of machines aren’t adequate to run a superintelligent set of networks, or at least it would take massive and noticeable amounts of compute.
I don’t think he’s addressing algorithmic improvements to compute efficiency at all. But it seems to me that they’d go in the same direction; delaying work on AGI would also produce more algorithmic improvements that would make it even easier for small projects to create dangerous super intelligence.
I’m not sure I agree with his conclusion that short timelines are best, but I’m not sure it’s wrong, either. It’s complex because it depends on our ability to govern the rate of progress, and I don’t think anyone has a very good guess at this yet.
Okay, I’m also not sure if I agree with the conclusion, but the argument makes sense that way. I just feel like it’s a confusing use of terminology.
I think it would be clearer to phrase it slightly differently to distinguish “(a) we keep working on TAI and it takes ~10 years to build” from “(b) we stop research for 10 years and then build AGI almost immediately, which also takes ~10 years.” Both of those are “10 year timelines,” but (a) makes a claim about the dangers of not pushing forward as much as possible and (a) has higher “2020 training compute requirements” (the notion from Ajeya’s framework to estimate timelines given the assumption of continued research) than (b) because it involves more algorithmic progress.
It was brought to my attention that not everyone might use the concept of a “compute overhang” the same way.
In my terminology, there’s a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.
So, on my definition, there are two ways in which we might already be in a compute overhang:
(1) Timelines are very short and we could get TAI with “current algorithms” (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.
(2) We couldn’t get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn’t relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we’ll be in the same situation as described in (1).
I would’ve guessed that Sam Altman was using it the same way, but now I’m not sure anymore.
I guess another way to use the concept is the following:
Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during “takeoff”/intelligence explosion? “Compute overhang” here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.
On that definition, it’s actually quite straightforward that shorter timelines imply less compute overhang.
Also, this definition arguably matches the context from Bostrom’s Superintelligence more closely, where I first came across the concept of a “hardware overhang.” Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.
(To complicate matters, there’s been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)
I’m pretty sure what he means by short timelines giving less compute overhang is this: if we were to somehow delay working on AGI for, say, ten years, we’d have such an improvement in compute that it could probably run on a small cluster or even a laptop. The implied claim here is that current generations of machines aren’t adequate to run a superintelligent set of networks, or at least it would take massive and noticeable amounts of compute.
I don’t think he’s addressing algorithmic improvements to compute efficiency at all. But it seems to me that they’d go in the same direction; delaying work on AGI would also produce more algorithmic improvements that would make it even easier for small projects to create dangerous super intelligence.
I’m not sure I agree with his conclusion that short timelines are best, but I’m not sure it’s wrong, either. It’s complex because it depends on our ability to govern the rate of progress, and I don’t think anyone has a very good guess at this yet.
Okay, I’m also not sure if I agree with the conclusion, but the argument makes sense that way. I just feel like it’s a confusing use of terminology.
I think it would be clearer to phrase it slightly differently to distinguish “(a) we keep working on TAI and it takes ~10 years to build” from “(b) we stop research for 10 years and then build AGI almost immediately, which also takes ~10 years.” Both of those are “10 year timelines,” but (a) makes a claim about the dangers of not pushing forward as much as possible and (a) has higher “2020 training compute requirements” (the notion from Ajeya’s framework to estimate timelines given the assumption of continued research) than (b) because it involves more algorithmic progress.
It was brought to my attention that not everyone might use the concept of a “compute overhang” the same way.
In my terminology, there’s a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.
So, on my definition, there are two ways in which we might already be in a compute overhang:
(1) Timelines are very short and we could get TAI with “current algorithms” (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.
(2) We couldn’t get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn’t relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we’ll be in the same situation as described in (1).
I would’ve guessed that Sam Altman was using it the same way, but now I’m not sure anymore.
I guess another way to use the concept is the following:
Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during “takeoff”/intelligence explosion? “Compute overhang” here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.
On that definition, it’s actually quite straightforward that shorter timelines imply less compute overhang.
Also, this definition arguably matches the context from Bostrom’s Superintelligence more closely, where I first came across the concept of a “hardware overhang.” Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.
(To complicate matters, there’s been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)