How do takeoff speeds affect the probability of bad outcomes from AGI?

Introduction

In gen­eral, peo­ple seem to treat slow take­off as the safer op­tion as com­pared to clas­sic FOOMish take­off (see e.g. these in­ter­views, this re­port, etc). Below, I out­line some fea­tures of slow take­off and what they might mean for fu­ture out­comes. They do not seem to point to an un­am­bigu­ously safer sce­nario, though slow take­off does seem on the whole like­lier to lead to good out­comes.

So­cial and in­sti­tu­tional effect of pre­cur­sor AI

If there’s a slow take­off, AI is a sig­nifi­cant fea­ture of the world far be­fore we get to su­per­hu­man AI.[1] One way to frame this is that ev­ery­thing is already re­ally weird be­fore there’s any real dan­ger of x-risks. Un­less AI is some­how not used in any prac­ti­cal ap­pli­ca­tions, the pre-su­per­hu­man but still very ca­pa­ble AI will lead to mas­sive eco­nom­i­cal, tech­nolog­i­cal, and prob­a­bly so­cial changes.

If we ex­pect sig­nifi­cant changes to the state of the world dur­ing take­off, it makes it harder to pre­dict what kinds of land­scape the AI re­searchers of that time will be fac­ing. If the world changes a lot be­tween now and su­per­hu­man AI, any work on in­sti­tu­tional change or pub­lic policy might be ir­rele­vant by the time it mat­ters. Also, the biggest effects may be in the AI com­mu­nity, which would be clos­est to the rapidly chang­ing tech­nolog­i­cal land­scape.

The kinds of work needed if ev­ery­thing is chang­ing rapidly also seem differ­ent. Spe­cific or­ga­ni­za­tions or di­rect changes might not sur­vive in their origi­nal, use­ful form. The peo­ple who have thought about how to deal with the sort of prob­lems we might be fac­ing then might be well po­si­tioned to sug­gest solu­tions, though. This im­plies that more foun­da­tional work might be more valuable in this situ­a­tion.

While I ex­pect this to be very difficult to pre­dict from our van­tage point, one pos­si­ble change is mass tech­nolog­i­cal un­em­ploy­ment well be­fore su­per­hu­man AI. Of course, his­tor­i­cally peo­ple have pre­dicted tech­nolog­i­cal un­em­ploy­ment from many new in­ven­tions, but the abil­ity to re­place large frac­tions of in­tel­lec­tual work may be qual­i­ta­tively differ­ent. If AI ap­proaches hu­man-level at most tasks and is price-com­pet­i­tive, the need for hu­mans re­duces down to ar­eas where be­ing biolog­i­cal is a bonus and the few tasks it hasn’t mas­tered.[2]

The effects of such un­em­ploy­ment could be very differ­ent de­pend­ing on the coun­try and poli­ti­cal situ­a­tion, but his­tor­i­cally mass un­em­ploy­ment has of­ten led to un­rest. (The Arab Spring, for in­stance, is some­times linked to youth un­em­ploy­ment rates.) This makes any at­tempts at long-term in­fluence that do not seem ca­pa­ble of adapt­ing to this a much worse bet. Some sort of UBI-like re­dis­tri­bu­tion scheme might make the tran­si­tion eas­ier, though even with­out a sig­nifi­cant in­crease in in­come in­equal­ity some forms of poli­ti­cal or so­cial in­sta­bil­ity seem likely to me.

From a safety per­spec­tive, nor­mal­ized AI seems like it could go in sev­eral di­rec­tions. On one hand, I can imag­ine it turn­ing out some­thing like nu­clear power plants, where it is com­mon knowl­edge that they re­quire ex­ten­sive safety mea­sures. This could hap­pen ei­ther af­ter some large-scale but not global dis­aster (some­thing like Ch­er­nobyl), or as a side-effect of giv­ing the AI more con­trol over es­sen­tial re­sources (the elec­tri­cal grid has, I should hope, bet­ter safety fea­tures than a text gen­er­a­tor).

The other, and to me more plau­si­ble sce­nario, is that the grad­ual adop­tion of AI makes ev­ery­one dis­miss con­cerns as alarmist. This does not seem en­tirely un­rea­son­able: the more ev­i­dence peo­ple have that AI be­com­ing more ca­pa­ble doesn’t cause catas­tro­phe, the less likely it is that the tip­ping point hasn’t been passed yet.

His­tor­i­cal re­ac­tion to dan­ger­ous technologies

A so­ciety in­creas­ingly de­pen­dent on AI is un­likely to be will­ing to halt or scale back AI use or re­search. His­tor­i­cally, I can think of some cases where we’ve vol­un­tar­ily stopped the use of a tech­nol­ogy, but they mostly seem con­nected to visi­ble on­go­ing is­sues or did not re­sult in giv­ing up any sig­nifi­cant ad­van­tage or op­por­tu­nity:

  • Pes­ti­cides such as DDT caused the near-ex­tinc­tion of sev­eral bird species (rather dra­mat­i­cally in­clud­ing the bald ea­gle).

  • Chem­i­cal war­fare is largely in­effec­tive as a weapon against a pre­pared army.

  • Se­ri­ous nu­clear pow­ers have never re­duced their stock of nu­clear weapons to the point of sig­nifi­cantly re­duc­ing their abil­ity to main­tain a cred­ible nu­clear de­ter­rent. Sev­eral coun­tries (South Africa, Be­larus, Kaza­khstan, Ukraine) have got­ten rid of their en­tire nu­clear ar­se­nals.

  • Air­ships are not com­pet­i­tive with ad­vanced planes and were already de­clin­ing in use be­fore the Hi­den­berg dis­aster and other high-pro­file ac­ci­dents.

  • Drug re­calls are quite com­mon and seem to re­spond eas­ily to newly available ev­i­dence. It isn’t clear to me how many of them rep­re­sent a sig­nifi­cant change in the med­i­cal care available to con­sumers.

I can think of two cases in which there was a non­triv­ial fear of global catas­trophic risk from a new in­ven­tion (nu­clear weapons ig­nit­ing the at­mo­sphere, CERN). Ar­guably, con­cerns about re­com­bi­nant DNA also count. In both cases, the fears were taken se­ri­ously, found “no self-prop­a­gat­ing chain of nu­clear re­ac­tions is likely to be started” and “no ba­sis for any con­ceiv­able threat” re­spec­tively, and the in­ven­tion moved on.

This is a some­what en­courag­ing track record of not just dis­miss­ing such con­cerns as im­pos­si­ble, but it is not ob­vi­ous to me whether the pro­jects would have halted had the con­clu­sions been less defini­tive. There’s also the rather un­pleas­ant am­bi­guity of “likely” and some ev­i­dence of un­cer­tainty in the nu­clear pro­ject, ex­panded on here. Of course, the at­mo­sphere re­mained unig­nited, but since we un­for­tu­nately don’t have any re­ports from the uni­verse where it did this doesn’t serve as par­tic­u­larly con­vinc­ing ev­i­dence.

Un­like the tech­nolo­gies listed two para­graphs up, CERN and the nu­clear pro­ject seem like closer analo­gies to fast take­off. There is a sud­den dan­ger with a clear thresh­old to step over (start­ing the par­ti­cle col­lider, set­ting off the bomb), un­like the risks from cli­mate change or other tech­nolog­i­cal dan­gers which are of­ten cu­mu­la­tive or hit-based. My guess, based on these very limited ex­am­ples, is that if it is clear which pro­ject poses a fast-take­off style risk it will be halted if the risk can be shown to have leg­ible ar­gu­ments be­hind it and is not eas­ily shown to be highly un­likely. A slow-take­off style risk, in which ca­pa­bil­ities slowly mount, seems more likely to have re­searchers take each small step with­out care­fully eval­u­at­ing the risks ev­ery time.

Rele­vance of ad­vanced pre­cur­sor AIs to safety of su­per­hu­man AI

An ar­gu­ment in fa­vor of slow take­off sce­nar­ios be­ing gen­er­ally safer is that we will get to see and ex­per­i­ment with the pre­cur­sor AIs be­fore they be­come ca­pa­ble of caus­ing x-risks.[3] My con­fi­dence in this de­pends on how likely it is that the dan­gers of a su­per­hu­man AI are analo­gous to the dan­gers of, say, an AI with 2X hu­man ca­pa­bil­ities. Tra­di­tional x-risk ar­gu­ments around fast take­off are in part pred­i­cated on the as­sump­tion that we can­not ex­trap­o­late all of the be­hav­ior and risks of a pre­cur­sor AI to its su­per­hu­man de­scen­dant.

In­tu­itively, the smaller the change in ca­pa­bil­ities from an AI we know is safe to an untested var­i­ant, the less likely it is to sud­denly be catas­troph­i­cally dan­ger­ous. “Less likely”, how­ever, does not mean it could not hap­pen, and a se­ries of small steps each with a small risk are not nec­es­sar­ily in­her­ently less dan­ger­ous than travers­ing the same space in one gi­ant leap. Tight feed­back loops mean rapid ma­te­rial changes to the AI, and sig­nifi­cant change to the pre­cur­sor AI runs the risk of it­self be­ing dan­ger­ous, so there is a need for cau­tion at ev­ery step, in­clud­ing pos­si­bly af­ter it seems ob­vi­ous to ev­ery­one that they’ve “won”.

De­spite this, I think that en­g­ineers who can move in small steps seem more likely to catch any­thing dan­ger­ous be­fore it can turn into a catas­tro­phe. At the very least, if some­thing is not fun­da­men­tally differ­ent than what they’ve seen be­fore, it would be eas­ier to rea­son about it.

Re­ac­tions to pre­cur­sor AIs

Even if the be­hav­ior of this pre­cur­sor AI is pre­dic­tive of the su­per­hu­man AI’s, our abil­ity to use this test­ing ground de­pends on the re­ac­tion to the po­ten­tial dan­gers of the pre­cur­sor AI. Per­son­ally, I would ex­pect a shift in mind­set as AI be­comes ob­vi­ously more ca­pa­ble than hu­mans in many do­mains. How­ever, whether this shift in mind­set is be­ing more care­ful or in­stead ab­di­cat­ing de­ci­sions to the AI en­tirely seems un­clear to me.

The way I play chess with a much stronger op­po­nent is very differ­ent from how I play with a weaker or equally matched one. With the stronger op­po­nent I am far more likely to ex­pect ob­vi­ous-look­ing blun­ders to ac­tu­ally be a set-up, for in­stance, and spend more time try­ing to figure out what ad­van­tage they might gain from it. On the other hand, I never bother to check my calcu­la­tor’s math by hand, be­cause the odds that it’s wrong is far lower than the chance that I will mess up some­where in my ar­ith­metic. If some­one came up with an AI-calcu­la­tor that gave oc­ca­sional sub­tly wrong an­swers, I cer­tainly wouldn’t no­tice.

Tak­ing ad­van­tage of the benefits of a slow take­off also re­quires the abil­ity to have in­sti­tu­tions ca­pa­ble of notic­ing and pre­vent­ing prob­lems. In a fast take­off sce­nario, it is much eas­ier for a sin­gle, rel­a­tively small pro­ject to unilat­er­ally take off. This is, es­sen­tially, a gam­ble on that par­tic­u­lar team’s abil­ity to pre­vent dis­aster.

In a slow take­off, I think it is more likely to be ob­vi­ous that some pro­ject(s) seem to be trend­ing in that di­rec­tion, which in­creases the chance that if the pro­ject seems un­safe there will be time to im­pose ex­ter­nal con­trol on it. How much of an ad­van­tage this is de­pends on how much you trust whichever in­sti­tu­tions will be needed to im­pose those con­trols.

Some his­tor­i­cal prece­dents for co­op­er­a­tion (or lack thereof) in con­trol­ling dan­ger­ous tech­nolo­gies and their side-effects in­clude:

  • Nu­clear pro­lifer­a­tion treaties re­duce the cost of a zero-sum arms race, but it isn’t clear to me if they sig­nifi­cantly re­duced the risk of nu­clear war.

  • Pol­lu­tion reg­u­la­tions have had very mixed re­sults, with some ma­jor suc­cesses (eg acid rain) but on the whole failing to avert mas­sive global change.

  • Some­what closer to home, the re­sponse to Covid-19 hasn’t been par­tic­u­larly en­courag­ing.

  • The Asilo­mar Con­fer­ence, which seems to me the most suc­cess­ful of these, in­volved a rel­a­tively small sci­en­tific field vol­un­tar­ily ad­her­ing to some limits on po­ten­tially dan­ger­ous re­search un­til more in­for­ma­tion could be gath­ered.

Hu­man­ity’s track record in this re­spect seems to me to be de­cid­edly mixed. It is un­clear which way the re­sponse to AI will go, and it seems likely that it will be de­pen­dent on highly lo­cal fac­tors.

What is the win con­di­tion?

A com­mon as­sump­tion I’ve seen is that once there is al­igned su­per­hu­man AI, the su­per­hu­man AI will pre­vent any un­al­igned AIs. This ar­gu­ment seems to hinge on the defi­ni­tion of “al­igned”, which I’m not in­ter­ested in ar­gu­ing here. The rele­vant as­sump­tion is that an AI al­igned in the sense of not caus­ing catas­tro­phe and con­tribut­ing sig­nifi­cantly to eco­nomic growth is not nec­es­sar­ily al­igned in the sense that it will pre­vent un­al­igned AIs from oc­cur­ring, whether its own “de­scen­dants” or out of some other pro­ject.[4]

I can perfectly well imag­ine an AI built to (for in­stance) re­spect hu­man val­ues like in­de­pen­dence and sci­en­tific cu­ri­os­ity that, while benev­olent in a very real sense, would not pre­vent the cre­ation of un­al­igned AIs. A slow take­off sce­nario seems to me more likely to con­tain mul­ti­ple (many?) such AIs. In this sce­nario, any new pro­ject runs the risk of be­ing the one that will mess some­thing up and end up un­al­igned.

An ad­di­tional source of risk is mod­ifi­ca­tion of ex­ist­ing AIs rather than the cre­ation of new ones. I would be sur­prised if we could re­sist the temp­ta­tion to tin­ker with the ex­ist­ing benev­olent AI’s goals, mo­tives, and so on. If the AI were pro­grammed to al­low such a thing, it would be pos­si­ble (though I sus­pect un­likely with­out gross in­com­pe­tence, if we knew enough to cre­ate the origi­nal AI safely in the first place) to change a benev­olent AI into an un­al­igned one.

How­ever, de­spite the ex­is­tence of a benev­olent AI not nec­es­sar­ily solv­ing al­ign­ment for­ever, I ex­pect us to be bet­ter off than in the case of un­al­igned AI emerg­ing first. At the very least, the first AIs may be able to bar­gain with or defend us against the un­al­igned AI.

Conclusion

My cur­rent im­pres­sion is that, while slow take­off seems on-the-whole safer (and likely im­plies a less thorny tech­ni­cal al­ign­ment prob­lem), it should not be mostly ne­glected in fa­vor of work on fast take­off sce­nar­ios as im­plied e.g. here. Sig­nifi­cant in­sti­tu­tional and cul­tural com­pe­tence (and/​or luck) seems to be re­quired to reap some of the benefits in­volved in slow-take­off. How­ever, there are many con­sid­er­a­tions that I haven’t ad­dressed and more that I haven’t thought of. Most of the use I ex­pect this to be is as a list of con­sid­er­a­tions, not as the lead-up to any kind of bot­tom line.

Thanks to Buck Sh­legeris, Daniel Filan, Richard Ngo, and Jack Ryan for thoughts on an ear­lier draft of this post.


  1. I use this ev­ery­where to mean AI far sur­pass­ing hu­mans on all sig­nifi­cant axes ↩︎

  2. See e.g. Robin Han­son’s Eco­nomic Growth Given Ma­chine In­tel­li­gence ↩︎

  3. An ad­di­tional point is that the tech­ni­cal land­scape at the start of take­off is likely to be very differ­ent from the tech­ni­cal land­scape near the end. It isn’t en­tirely clear how far the in­sights gained from the very first AIs will trans­fer to the su­per­hu­man ones. Pre- and post-ma­chine learn­ing AI, for in­stance, seem to have very differ­ent tech­ni­cal challenges. ↩︎

  4. A similar dis­tinc­tion: “MIRI thinks suc­cess is guaran­tee­ing that un­al­igned in­tel­li­gences are never cre­ated, whereas Chris­ti­ano just wants to leave the next gen­er­a­tion of in­tel­li­gences in at least as good of a place as hu­mans were when build­ing them.” Source ↩︎