It was brought to my attention that not everyone might use the concept of a “compute overhang” the same way.
In my terminology, there’s a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.
So, on my definition, there are two ways in which we might already be in a compute overhang:
(1) Timelines are very short and we could get TAI with “current algorithms” (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.
(2) We couldn’t get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn’t relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we’ll be in the same situation as described in (1).
I would’ve guessed that Sam Altman was using it the same way, but now I’m not sure anymore.
I guess another way to use the concept is the following:
Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during “takeoff”/intelligence explosion? “Compute overhang” here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.
On that definition, it’s actually quite straightforward that shorter timelines imply less compute overhang.
Also, this definition arguably matches the context from Bostrom’s Superintelligence more closely, where I first came across the concept of a “hardware overhang.” Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.
(To complicate matters, there’s been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)
It was brought to my attention that not everyone might use the concept of a “compute overhang” the same way.
In my terminology, there’s a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.
So, on my definition, there are two ways in which we might already be in a compute overhang:
(1) Timelines are very short and we could get TAI with “current algorithms” (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.
(2) We couldn’t get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn’t relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we’ll be in the same situation as described in (1).
I would’ve guessed that Sam Altman was using it the same way, but now I’m not sure anymore.
I guess another way to use the concept is the following:
Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during “takeoff”/intelligence explosion? “Compute overhang” here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.
On that definition, it’s actually quite straightforward that shorter timelines imply less compute overhang.
Also, this definition arguably matches the context from Bostrom’s Superintelligence more closely, where I first came across the concept of a “hardware overhang.” Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.
(To complicate matters, there’s been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)