Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination and more likely to lead to a slower takeoff due to less of a compute overhang, and a slower takeoff gives us more time to figure out empirically how to solve the safety problem and how to adapt.
I don’t understand the part of “less of a compute overhang.” There’s still room to ramp up compute use in the next few years, so if timelines are very short, that means transformative AI happens when we’re not yet pushing the limits of compute. That seems to me where the compute overhang is uncontroversially quite large?
Conceivably, there could also be a large compute overhang in other scenarios (where actors are pushing the competitive limits of compute use). However, wouldn’t that depend on the nature of algorithmic progress? If you think present-day algorithms combined with “small” improvements can’t get us to transformative AI but some a single (or small number of) game-changing algorithmic insight(s) will get us there, then I agree that “the longer it takes us to find out the algorithmic insight(s), the bigger the compute overhang.” Is that the view here?
If so, that would be good to know because I thought many people were somewhat confident that algorithmic progress is unlikely to be “jumpy” in that way? (Admittedly, that never seemed like a rock-solid assumption to me.) If not, does anyone know how this statement about short timelines implying less of a compute overhang was meant?
I’m pretty sure what he means by short timelines giving less compute overhang is this: if we were to somehow delay working on AGI for, say, ten years, we’d have such an improvement in compute that it could probably run on a small cluster or even a laptop. The implied claim here is that current generations of machines aren’t adequate to run a superintelligent set of networks, or at least it would take massive and noticeable amounts of compute.
I don’t think he’s addressing algorithmic improvements to compute efficiency at all. But it seems to me that they’d go in the same direction; delaying work on AGI would also produce more algorithmic improvements that would make it even easier for small projects to create dangerous super intelligence.
I’m not sure I agree with his conclusion that short timelines are best, but I’m not sure it’s wrong, either. It’s complex because it depends on our ability to govern the rate of progress, and I don’t think anyone has a very good guess at this yet.
Okay, I’m also not sure if I agree with the conclusion, but the argument makes sense that way. I just feel like it’s a confusing use of terminology.
I think it would be clearer to phrase it slightly differently to distinguish “(a) we keep working on TAI and it takes ~10 years to build” from “(b) we stop research for 10 years and then build AGI almost immediately, which also takes ~10 years.” Both of those are “10 year timelines,” but (a) makes a claim about the dangers of not pushing forward as much as possible and (a) has higher “2020 training compute requirements” (the notion from Ajeya’s framework to estimate timelines given the assumption of continued research) than (b) because it involves more algorithmic progress.
It was brought to my attention that not everyone might use the concept of a “compute overhang” the same way.
In my terminology, there’s a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.
So, on my definition, there are two ways in which we might already be in a compute overhang:
(1) Timelines are very short and we could get TAI with “current algorithms” (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.
(2) We couldn’t get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn’t relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we’ll be in the same situation as described in (1).
I would’ve guessed that Sam Altman was using it the same way, but now I’m not sure anymore.
I guess another way to use the concept is the following:
Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during “takeoff”/intelligence explosion? “Compute overhang” here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.
On that definition, it’s actually quite straightforward that shorter timelines imply less compute overhang.
Also, this definition arguably matches the context from Bostrom’s Superintelligence more closely, where I first came across the concept of a “hardware overhang.” Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.
(To complicate matters, there’s been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)
I think that time later is significantly more valuable than time now (and time now is much more valuable than time in the old days). Safety investment and other kinds of adaptation increase greatly as the risks become more immediate (capabilities investment also increases, but that’s already included); safety research gets way more useful (I think most of the safety community’s work is 10x+ less valuable than work done closer to catastrophe, even if the average is lower than that). Having a longer period closer to the end seems really really good to me.
If we lose 1 year now, and get back 0.5 years later., and if years later are 2x as good as years now, you’d be breaking even.
My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3. If we had built GPT-3 in 2010, I think the world’s situation would probably have been better. We’d maybe be at our current capability level in 2018, scaling up further would be going more slowly because the community had already picked low hanging fruit and was doing bigger training runs, the world would have had more time to respond to the looming risk, and we would have done more good safety research.
I don’t understand the part of “less of a compute overhang.” There’s still room to ramp up compute use in the next few years, so if timelines are very short, that means transformative AI happens when we’re not yet pushing the limits of compute. That seems to me where the compute overhang is uncontroversially quite large?
Conceivably, there could also be a large compute overhang in other scenarios (where actors are pushing the competitive limits of compute use). However, wouldn’t that depend on the nature of algorithmic progress? If you think present-day algorithms combined with “small” improvements can’t get us to transformative AI but some a single (or small number of) game-changing algorithmic insight(s) will get us there, then I agree that “the longer it takes us to find out the algorithmic insight(s), the bigger the compute overhang.” Is that the view here?
If so, that would be good to know because I thought many people were somewhat confident that algorithmic progress is unlikely to be “jumpy” in that way? (Admittedly, that never seemed like a rock-solid assumption to me.) If not, does anyone know how this statement about short timelines implying less of a compute overhang was meant?
I’m pretty sure what he means by short timelines giving less compute overhang is this: if we were to somehow delay working on AGI for, say, ten years, we’d have such an improvement in compute that it could probably run on a small cluster or even a laptop. The implied claim here is that current generations of machines aren’t adequate to run a superintelligent set of networks, or at least it would take massive and noticeable amounts of compute.
I don’t think he’s addressing algorithmic improvements to compute efficiency at all. But it seems to me that they’d go in the same direction; delaying work on AGI would also produce more algorithmic improvements that would make it even easier for small projects to create dangerous super intelligence.
I’m not sure I agree with his conclusion that short timelines are best, but I’m not sure it’s wrong, either. It’s complex because it depends on our ability to govern the rate of progress, and I don’t think anyone has a very good guess at this yet.
Okay, I’m also not sure if I agree with the conclusion, but the argument makes sense that way. I just feel like it’s a confusing use of terminology.
I think it would be clearer to phrase it slightly differently to distinguish “(a) we keep working on TAI and it takes ~10 years to build” from “(b) we stop research for 10 years and then build AGI almost immediately, which also takes ~10 years.” Both of those are “10 year timelines,” but (a) makes a claim about the dangers of not pushing forward as much as possible and (a) has higher “2020 training compute requirements” (the notion from Ajeya’s framework to estimate timelines given the assumption of continued research) than (b) because it involves more algorithmic progress.
It was brought to my attention that not everyone might use the concept of a “compute overhang” the same way.
In my terminology, there’s a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.
So, on my definition, there are two ways in which we might already be in a compute overhang:
(1) Timelines are very short and we could get TAI with “current algorithms” (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.
(2) We couldn’t get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn’t relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we’ll be in the same situation as described in (1).
I would’ve guessed that Sam Altman was using it the same way, but now I’m not sure anymore.
I guess another way to use the concept is the following:
Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during “takeoff”/intelligence explosion? “Compute overhang” here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.
On that definition, it’s actually quite straightforward that shorter timelines imply less compute overhang.
Also, this definition arguably matches the context from Bostrom’s Superintelligence more closely, where I first came across the concept of a “hardware overhang.” Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.
(To complicate matters, there’s been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)
If I had to steelman the view, I’d go with Paul’s argument here: https://www.lesswrong.com/posts/4Pi3WhFb4jPphBzme/don-t-accelerate-problems-you-re-trying-to-solve?commentId=z5xfeyA9poywne9Mx