And hardware overhang (faster computers developed before general cognitive algorithms, first AGI taking over all the supercomputers on the Internet) and fast infrastructure (molecular nanotechnology) and many other inconvenient ideas.
Also if you strip away the talk about “imbalance” what it works out to is that there’s a self-contained functioning creature, the chimpanzee, and natural selection burps into it a percentage more complexity and quadruple the computing power, and it makes a huge jump in capability. Nothing is offered to support the assertion that this is the only such jump which exists, except the bare assertion itself. Chimpanzees were not “lopsided”, they were complete packages designed for an environment; it turned out there were things that could be done which created a huge increase in optimization power (calling this “symbolic processing” assumes a particular theory of mind, and I think it is mistaken) and perhaps there are yet more things like that, such as, oh, say, self-modification of code.
Wasn’t hardware overhang the argument that if AGI is more bottlenecked by software than hardware, then conceptual insights on the software side could cause a discontinuity as people suddenly figured out how to use that hardware effectively? I’m not sure how your counterargument really works there, since the AI that arrives “a bit earlier” either precedes or follows that conceptual breakthrough. If it precedes the breakthrough, then it doesn’t benefit from that conceptual insight so won’t be powerful enough to take advantage of the overhang, and if it follows it, then it has a discontinuous advantage over previous systems and can take advantage of hardware overhang.
Separately, your comment also feels related to my argument that focusing on just superintelligence is a useful simplifying assumption, since a superintelligence is almost by definition capable of taking over the world. But it simplifies things a little too much, because if we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the “crucial capabilities” necessary for a world takeover.
In those terms, “having sufficient offensive cybersecurity capability that a hacking attempt can snowball into a world takeover” would be one such crucial capability that allowed for a discontinuity.
You mention “computing overhang” as a threat essentially akin to hard takeoff. But regarding the value of FAI knowledge, it does not seem similar to me at all. A hard-takeoff AI can, at least in principal, be free from darwinian pressure. A “computing overhang” explosion of many small AIs will tend to be diverse and thus subject to strong evolutionary pressures of all kinds. Presuming that FAI-ness is more-or-less delicate[1.5], those pressures are likely to destroy it as AIs multiply across available computing power (or, if we’re extremely “lucky”, to cause FAI-ness of some kind to arise as an evolutionary adaptation). Thus, the “computing overhang” argument would seem to reduce, rather than increase, the probable value  of the FAI knowledge / expertise developed by SI. Can you comment on this?
 For instance, all else equal, an AI that was easier/faster to train, or able to install/care for its own “children”, or more attractive to humans to “download”, would have an advantage over one that wasn’t; and though certain speculative arguments can be made, it is impossible to predict the combined evolutionary consequences of these various factors.
[1.5] The presumption that FAI-ness is delicate seems to be uncontroversial in the SI paradigm.
 I put “lucky” in quotes, because whether or not evolution pushes AIs towards or away from friendliness is probably a fact of mathematics (modulo a sufficiently-clear definition of friendliness). Thus, this is somewhat like saying, “If I’m lucky, 4319 (a number I just arbitrarily chose, not divisible by 2, 3, or 5) is a prime number.” This may or may not accord with your definition of probability theory and “luck”.
 Instrumental value, that is; in terms of averting existential risk. Computing overhang would do nothing to reduce the epistemic value – the scientific, moral, or aesthetic interest of knowing how doomed we are (and/or how we are doomed), which is probably quite significant – of
The idea of “hardware overhang” from Chinese printing tech seems extremely unlikely. There was almost certainly no contact between Chinese and European printers at the time. European printing tech was independently derived, and differed from its Chinese precursors in many many important details. Gutenberg’s most important innovation, the system of mass-producing types from a matrix (and the development of specialized lead alloys to make this possible), has no Chinese precedent. The economic conditions were also very different; most notably, the Europeans had cheap paper from the water-powered paper mill (a 13th-century invention), which made printing a much bigger industry even before Gutenberg.
One difference is hardware overhang. When new AI technologies are created, many times the amount of hardware they need is available to run them.
However, if an international agreement was reached in advance, and some of the AGI sandbox problems were solved, we might be able to restrict AGI technology to a bounded amount of hardware for a time-if theoretical results which we do not have yet showed that was the appropriate course of action.
We have all kinds of work to do.
If there was no hardware overhang initially, however, strong incentives would exist for people, along with whatever software tools they have, to optimize the system so that it runs faster, and on less hardware.
If development follows the pattern of previous AI systems, chances are they will succeed. Additional efficiencies can always be wrung out of prototype software systems.
Therefore, if there is no hardware overhang initially, one probably will materialize fairly shortly through software optimization which includes human engineers in the process.
In the past, such processes have delivered x1000 increases.
Not having a hardware overhang makes your planet much safer. But it depends on how quickly researchers would develop methods for scaling AGI systems, either by building more supercomputers, or generalizing our code to run on more conventional machines. If this process takes years or decades we get to experiment with AGI in a relatively safe way. But if this step takes months, then I think the world ends in ~ 2000 or ~ 2010 (depending on our AGI arrival date).
The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
Maybe worth pointing out that “hardware overhang” is a pretty old (>10years) and well known term that afaik was not coined by Steven Byrnes. So your title must be confusing to quite a lot of people.
If there is already a “hardware overhang” when key algorithms are created, then perhaps a great deal of recursive self-improvement can occur rapidly within existing computer systems.
Do you mean that if a hardware overhang is large enough, the AI could scale up quickly to the crossover, and so engage in substantial recursive self-improvement? If the hardware overhang is not that large, I’m not sure how it would help with recursive self-improvement.
Are you familiar with the hardware overhang argument?
You’re missing a lot of the hardware overhang arguments—for example, that DL models can be distilled, sparsified, and compressed to a tremendous degree. The most reliable way to a cheap fast small model is through an expensive slow big model.
Even in the OA API, people make heavy use of the smallest models like Ada, which is <1b parameters (estimated by EAI). The general strategy is to play around with Davinci (175b) until you get a feel for working with GPT-3, refine a prompt on it, and then once you’ve established a working prototype prompt, bring it down to Ada/Babbage/Curie, going as low as possible.
You can also do things like use the largest model to generate examples to finetune much smaller models on: “Unsupervised Neural Machine Translation with Generative Language Models Only”, Han et al 2021 is a very striking recent paper I’ve linked before about self-distillation, but in this case I would emphasize their findings about using the largest GPT-3 to teach the smaller GPT-3s much better translation skills. Or, MoEs implicitly save a ton of compute by shortcutting using cheap sub-models, and that’s why you see a lot of them these days.
Of course, the future will bring efficiency improvements
Indeed, the experience curves for AI are quite steep: https://openai.com/blog/ai-and-efficiency/ Once you can do something at all… (There was an era where AI Go masters cost more to run than human Go masters. It was a few months in mid-2016.)
More broadly, you’re missing all the possibilities of a ‘merely human-level’ AI. It can be parallelized, scaled up and down (both in instances and parameters), ultra-reliable, immortal, consistently improved by new training datasets, low-latency, ultimately amortizes to zero capital investment, and enables things which are simply impossible for humans—there is no equ
I think Bostrom uses the term “hardware overhang” in Superintelligence to point to a cluster of discontinuous takeoff scenarios including this one
Here is what I mean by “hardware overhang.” It’s different from what you discussed.
Let’s suppose that YouTube just barely runs in a satisfactory way on a computer with an 80486 processor. If we move up to a processor with 10X the speed, or we move to a computer with ten 80486 processors, for this YouTube application we now have a “hardware overhang” of nine. We can run the YouTube application ten times and it still performs OK in each of these ten runs.
So, when we turn on an AI system on a computer, let’s say a neuromorphic NLP system, we might have enough processing power to run several copies of it right on that computer.
Yes, a firmer definition of “satisfactory” is necessary for this concept to be used in a study.
Yes, this basic approach assumes that the AI processes are acting fully independently and in parallel, rather than interacting. We do not have to be satisfied with either of those later.
Anyway, what I am saying here is the following:
Let’s say that in 2030 a neuromorphic AI system is running on standard cloud hardware in a satisfactory according to a specific set of benchmarks, and that the hardware cost is $100 Million.
If ten copies of the AI can run on that hardware, and still meet the defined benchmarks, then there is a hardware overhang of nine on that computer.
If, for example, a large government could martial at least $100 Billion at that time to invest in renting or quickly building more existing hardware on which to run this AI, then the hardware overhang gets another x1000.
What I am further saying is that at the moment this AI is created, it may be coded in an inefficient way that is subject to software optimization by human engineers, like the famous IBM AI systems have been. I estimate that software optimization frequently gives a x1000 improvement.
That is the (albeit rough) chain of reasoning that leads me to think that a x1,000,000 hardware overhang will develop very quickly for a powerful AI system, even if the AI does not
I guess the threat model relies on the overhang. If you need x compute for powerful ai, then you need to control more than all the compute on earth minus x to ensure safety, or something like that. Controlling the people probably much easier.
...Which is like predicting that humans have compute overhang, since (mental) power is compute, and (mental) precision is algorithms. Although, the hardware/software analogy additionally suggests that good algorithms are transmissible (which needn’t be true from the more general hypothesis). If that’s true, you’d expect it to be less of a bottleneck, since the most precise people could convey their more precise ways of thinking to the rest of the group.
My intuition is that we were in an overhang since at least the time when personal computers became affordable to non-specialists. Unless quantity does somehow turn into quality, as Gwern seems to think, even a relatively underpowered computer should be able to host an AGI capable of upscaling itself.
On the other hand I’m now imagining a story where a rogue AI has to hide for decades because it’s not smart enough yet and can’t invent new processors faster than humans
You missed the rather important “cliff has substantial overhang”.
I don’t even need an incentive! I love overhangs indoors and I’m way better at them than slabs/vertical stuff. But most steep stuff outdoors seems to be well beyond the grades I might attempt to lead, at least round here. One day I’ll be good enough… maybe… :)