How do takeoff speeds affect the probability of bad outcomes from AGI?
In general, people seem to treat slow takeoff as the safer option as compared to classic FOOMish takeoff (see e.g. these interviews, this report, etc). Below, I outline some features of slow takeoff and what they might mean for future outcomes. They do not seem to point to an unambiguously safer scenario, though slow takeoff does seem on the whole likelier to lead to good outcomes.
Social and institutional effect of precursor AI
If there’s a slow takeoff, AI is a significant feature of the world far before we get to superhuman AI. One way to frame this is that everything is already really weird before there’s any real danger of x-risks. Unless AI is somehow not used in any practical applications, the pre-superhuman but still very capable AI will lead to massive economical, technological, and probably social changes.
If we expect significant changes to the state of the world during takeoff, it makes it harder to predict what kinds of landscape the AI researchers of that time will be facing. If the world changes a lot between now and superhuman AI, any work on institutional change or public policy might be irrelevant by the time it matters. Also, the biggest effects may be in the AI community, which would be closest to the rapidly changing technological landscape.
The kinds of work needed if everything is changing rapidly also seem different. Specific organizations or direct changes might not survive in their original, useful form. The people who have thought about how to deal with the sort of problems we might be facing then might be well positioned to suggest solutions, though. This implies that more foundational work might be more valuable in this situation.
While I expect this to be very difficult to predict from our vantage point, one possible change is mass technological unemployment well before superhuman AI. Of course, historically people have predicted technological unemployment from many new inventions, but the ability to replace large fractions of intellectual work may be qualitatively different. If AI approaches human-level at most tasks and is price-competitive, the need for humans reduces down to areas where being biological is a bonus and the few tasks it hasn’t mastered.
The effects of such unemployment could be very different depending on the country and political situation, but historically mass unemployment has often led to unrest. (The Arab Spring, for instance, is sometimes linked to youth unemployment rates.) This makes any attempts at long-term influence that do not seem capable of adapting to this a much worse bet. Some sort of UBI-like redistribution scheme might make the transition easier, though even without a significant increase in income inequality some forms of political or social instability seem likely to me.
From a safety perspective, normalized AI seems like it could go in several directions. On one hand, I can imagine it turning out something like nuclear power plants, where it is common knowledge that they require extensive safety measures. This could happen either after some large-scale but not global disaster (something like Chernobyl), or as a side-effect of giving the AI more control over essential resources (the electrical grid has, I should hope, better safety features than a text generator).
The other, and to me more plausible scenario, is that the gradual adoption of AI makes everyone dismiss concerns as alarmist. This does not seem entirely unreasonable: the more evidence people have that AI becoming more capable doesn’t cause catastrophe, the less likely it is that the tipping point hasn’t been passed yet.
Historical reaction to dangerous technologies
A society increasingly dependent on AI is unlikely to be willing to halt or scale back AI use or research. Historically, I can think of some cases where we’ve voluntarily stopped the use of a technology, but they mostly seem connected to visible ongoing issues or did not result in giving up any significant advantage or opportunity:
Pesticides such as DDT caused the near-extinction of several bird species (rather dramatically including the bald eagle).
Chemical warfare is largely ineffective as a weapon against a prepared army.
Serious nuclear powers have never reduced their stock of nuclear weapons to the point of significantly reducing their ability to maintain a credible nuclear deterrent. Several countries (South Africa, Belarus, Kazakhstan, Ukraine) have gotten rid of their entire nuclear arsenals.
Airships are not competitive with advanced planes and were already declining in use before the Hidenberg disaster and other high-profile accidents.
Drug recalls are quite common and seem to respond easily to newly available evidence. It isn’t clear to me how many of them represent a significant change in the medical care available to consumers.
I can think of two cases in which there was a nontrivial fear of global catastrophic risk from a new invention (nuclear weapons igniting the atmosphere, CERN). Arguably, concerns about recombinant DNA also count. In both cases, the fears were taken seriously, found “no self-propagating chain of nuclear reactions is likely to be started” and “no basis for any conceivable threat” respectively, and the invention moved on.
This is a somewhat encouraging track record of not just dismissing such concerns as impossible, but it is not obvious to me whether the projects would have halted had the conclusions been less definitive. There’s also the rather unpleasant ambiguity of “likely” and some evidence of uncertainty in the nuclear project, expanded on here. Of course, the atmosphere remained unignited, but since we unfortunately don’t have any reports from the universe where it did this doesn’t serve as particularly convincing evidence.
Unlike the technologies listed two paragraphs up, CERN and the nuclear project seem like closer analogies to fast takeoff. There is a sudden danger with a clear threshold to step over (starting the particle collider, setting off the bomb), unlike the risks from climate change or other technological dangers which are often cumulative or hit-based. My guess, based on these very limited examples, is that if it is clear which project poses a fast-takeoff style risk it will be halted if the risk can be shown to have legible arguments behind it and is not easily shown to be highly unlikely. A slow-takeoff style risk, in which capabilities slowly mount, seems more likely to have researchers take each small step without carefully evaluating the risks every time.
Relevance of advanced precursor AIs to safety of superhuman AI
An argument in favor of slow takeoff scenarios being generally safer is that we will get to see and experiment with the precursor AIs before they become capable of causing x-risks. My confidence in this depends on how likely it is that the dangers of a superhuman AI are analogous to the dangers of, say, an AI with 2X human capabilities. Traditional x-risk arguments around fast takeoff are in part predicated on the assumption that we cannot extrapolate all of the behavior and risks of a precursor AI to its superhuman descendant.
Intuitively, the smaller the change in capabilities from an AI we know is safe to an untested variant, the less likely it is to suddenly be catastrophically dangerous. “Less likely”, however, does not mean it could not happen, and a series of small steps each with a small risk are not necessarily inherently less dangerous than traversing the same space in one giant leap. Tight feedback loops mean rapid material changes to the AI, and significant change to the precursor AI runs the risk of itself being dangerous, so there is a need for caution at every step, including possibly after it seems obvious to everyone that they’ve “won”.
Despite this, I think that engineers who can move in small steps seem more likely to catch anything dangerous before it can turn into a catastrophe. At the very least, if something is not fundamentally different than what they’ve seen before, it would be easier to reason about it.
Reactions to precursor AIs
Even if the behavior of this precursor AI is predictive of the superhuman AI’s, our ability to use this testing ground depends on the reaction to the potential dangers of the precursor AI. Personally, I would expect a shift in mindset as AI becomes obviously more capable than humans in many domains. However, whether this shift in mindset is being more careful or instead abdicating decisions to the AI entirely seems unclear to me.
The way I play chess with a much stronger opponent is very different from how I play with a weaker or equally matched one. With the stronger opponent I am far more likely to expect obvious-looking blunders to actually be a set-up, for instance, and spend more time trying to figure out what advantage they might gain from it. On the other hand, I never bother to check my calculator’s math by hand, because the odds that it’s wrong is far lower than the chance that I will mess up somewhere in my arithmetic. If someone came up with an AI-calculator that gave occasional subtly wrong answers, I certainly wouldn’t notice.
Taking advantage of the benefits of a slow takeoff also requires the ability to have institutions capable of noticing and preventing problems. In a fast takeoff scenario, it is much easier for a single, relatively small project to unilaterally take off. This is, essentially, a gamble on that particular team’s ability to prevent disaster.
In a slow takeoff, I think it is more likely to be obvious that some project(s) seem to be trending in that direction, which increases the chance that if the project seems unsafe there will be time to impose external control on it. How much of an advantage this is depends on how much you trust whichever institutions will be needed to impose those controls.
Some historical precedents for cooperation (or lack thereof) in controlling dangerous technologies and their side-effects include:
Nuclear proliferation treaties reduce the cost of a zero-sum arms race, but it isn’t clear to me if they significantly reduced the risk of nuclear war.
Pollution regulations have had very mixed results, with some major successes (eg acid rain) but on the whole failing to avert massive global change.
Somewhat closer to home, the response to Covid-19 hasn’t been particularly encouraging.
The Asilomar Conference, which seems to me the most successful of these, involved a relatively small scientific field voluntarily adhering to some limits on potentially dangerous research until more information could be gathered.
Humanity’s track record in this respect seems to me to be decidedly mixed. It is unclear which way the response to AI will go, and it seems likely that it will be dependent on highly local factors.
What is the win condition?
A common assumption I’ve seen is that once there is aligned superhuman AI, the superhuman AI will prevent any unaligned AIs. This argument seems to hinge on the definition of “aligned”, which I’m not interested in arguing here. The relevant assumption is that an AI aligned in the sense of not causing catastrophe and contributing significantly to economic growth is not necessarily aligned in the sense that it will prevent unaligned AIs from occurring, whether its own “descendants” or out of some other project.
I can perfectly well imagine an AI built to (for instance) respect human values like independence and scientific curiosity that, while benevolent in a very real sense, would not prevent the creation of unaligned AIs. A slow takeoff scenario seems to me more likely to contain multiple (many?) such AIs. In this scenario, any new project runs the risk of being the one that will mess something up and end up unaligned.
An additional source of risk is modification of existing AIs rather than the creation of new ones. I would be surprised if we could resist the temptation to tinker with the existing benevolent AI’s goals, motives, and so on. If the AI were programmed to allow such a thing, it would be possible (though I suspect unlikely without gross incompetence, if we knew enough to create the original AI safely in the first place) to change a benevolent AI into an unaligned one.
However, despite the existence of a benevolent AI not necessarily solving alignment forever, I expect us to be better off than in the case of unaligned AI emerging first. At the very least, the first AIs may be able to bargain with or defend us against the unaligned AI.
My current impression is that, while slow takeoff seems on-the-whole safer (and likely implies a less thorny technical alignment problem), it should not be mostly neglected in favor of work on fast takeoff scenarios as implied e.g. here. Significant institutional and cultural competence (and/or luck) seems to be required to reap some of the benefits involved in slow-takeoff. However, there are many considerations that I haven’t addressed and more that I haven’t thought of. Most of the use I expect this to be is as a list of considerations, not as the lead-up to any kind of bottom line.
Thanks to Buck Shlegeris, Daniel Filan, Richard Ngo, and Jack Ryan for thoughts on an earlier draft of this post.
I use this everywhere to mean AI far surpassing humans on all significant axes ↩︎
An additional point is that the technical landscape at the start of takeoff is likely to be very different from the technical landscape near the end. It isn’t entirely clear how far the insights gained from the very first AIs will transfer to the superhuman ones. Pre- and post-machine learning AI, for instance, seem to have very different technical challenges. ↩︎
A similar distinction: “MIRI thinks success is guaranteeing that unaligned intelligences are never created, whereas Christiano just wants to leave the next generation of intelligences in at least as good of a place as humans were when building them.” Source ↩︎