Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

Link post

My forth­com­ing pa­per, “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”, at­tempts to in­tro­duce a num­ber of con­sid­er­a­tions to the anal­y­sis of po­ten­tial risks from Ar­tifi­cial Gen­eral In­tel­li­gence (AGI). As the pa­per is long and oc­ca­sion­ally makes for some­what dry read­ing, I thought that I would briefly high­light a few of the key points raised in the pa­per.

The main idea here is that most of the dis­cus­sion about risks of AGI has been framed in terms of a sce­nario that goes some­thing along the lines of “a re­search group de­vel­ops AGI, that AGI de­vel­ops to be­come su­per­in­tel­li­gent, es­capes from its cre­ators, and takes over the world”. While that is one sce­nario that could hap­pen, fo­cus­ing too much on any sin­gle sce­nario makes us more likely to miss out al­ter­na­tive sce­nar­ios. It also makes the sce­nar­ios sus­cep­ti­ble to crit­i­cism from peo­ple who (cor­rectly!) point out that we are pos­tu­lat­ing very spe­cific sce­nar­ios that have lots of bur­den­some de­tails.

To ad­dress that, I dis­cuss here a num­ber of con­sid­er­a­tions that sug­gest dis­junc­tive paths to catas­trophic out­comes: paths that are of the form “A or B or C could hap­pen, and any one of them hap­pen­ing could have bad con­se­quences”.

Su­per­in­tel­li­gence ver­sus Cru­cial Capabilities

Bostrom’s Su­per­in­tel­li­gence, as well as a num­ber of other sources, ba­si­cally make the fol­low­ing ar­gu­ment:

  1. An AGI could be­come superintelligent

  2. Su­per­in­tel­li­gence would en­able the AGI to take over the world

This is an im­por­tant ar­gu­ment to make and an­a­lyze, since su­per­in­tel­li­gence ba­si­cally rep­re­sents an ex­treme case: if an in­di­vi­d­ual AGI may be­come as pow­er­ful as it gets, how do we pre­pare for that even­tu­al­ity? As long as there is a plau­si­ble chance for such an ex­treme case to be re­al­ized, it must be taken into ac­count.

How­ever, it is prob­a­bly a mis­take to fo­cus only on the case of su­per­in­tel­li­gence. Ba­si­cally, the rea­son why we are in­ter­ested in a su­per­in­tel­li­gence is that, by as­sump­tion, it has the cog­ni­tive ca­pa­bil­ities nec­es­sary for a world takeover. But what about an AGI which also had the cog­ni­tive ca­pa­bil­ities nec­es­sary for tak­ing over the world, and only those?

Such an AGI might not count as a su­per­in­tel­li­gence in the tra­di­tional sense, since it would not be su­per­hu­manly ca­pa­ble in ev­ery do­main. Yet, it would still be one that we should be con­cerned about. If we fo­cus too much on just the su­per­in­tel­li­gence case, we might miss the emer­gence of a “dumb” AGI which nev­er­the­less had the cru­cial ca­pa­bil­ities nec­es­sary for a world takeover.

That raises the ques­tion of what might be such cru­cial ca­pa­bil­ities. I don’t have a com­pre­hen­sive an­swer; in my pa­per, I fo­cus mostly on the kinds of ca­pa­bil­ities that could be used to in­flict ma­jor dam­age: so­cial ma­nipu­la­tion, cy­ber­war­fare, biolog­i­cal war­fare. Others no doubt ex­ist.

A pos­si­bly use­ful fram­ing for fu­ture in­ves­ti­ga­tions might be, “what level of ca­pa­bil­ity would an AGI need to achieve in a cru­cial ca­pa­bil­ity in or­der to be dan­ger­ous”, where the defi­ni­tion of “dan­ger­ous” is free to vary based on how se­ri­ous of a risk we are con­cerned about. One com­pli­ca­tion here is that this is a highly con­tex­tual ques­tion – with a su­per­in­tel­li­gence we can as­sume that the AGI may get ba­si­cally om­nipo­tent, but such a sim­plify­ing as­sump­tion won’t help us here. For ex­am­ple, the level of offen­sive biowar­fare ca­pa­bil­ity that would pose a ma­jor risk, de­pends on the level of the world’s defen­sive biowar­fare ca­pa­bil­ities. Also, we know that it’s pos­si­ble to in­flict enor­mous dam­age to hu­man­ity even with just hu­man-level in­tel­li­gence: who­ever is au­tho­rized to con­trol the ar­se­nal of a nu­clear power could trig­ger World War III, no su­per­hu­man smarts needed.

Cru­cial ca­pa­bil­ities are a dis­junc­tive con­sid­er­a­tion be­cause they show that su­per­in­tel­li­gence isn’t the only level of ca­pa­bil­ity that would pose a ma­jor risk: and there many differ­ent com­bi­na­tions of var­i­ous ca­pa­bil­ities – in­clud­ing ones that we don’t even know about yet – that could pose the same level of dan­ger as su­per­in­tel­li­gence.

In­ci­den­tally, this shows one rea­son why the com­mon crit­i­cism of “su­per­in­tel­li­gence isn’t some­thing that we need to worry about be­cause in­tel­li­gence isn’t uni­di­men­sional” is mis­founded – the AGI doesn’t need to be su­per­in­tel­li­gent in ev­ery di­men­sion of in­tel­li­gence, just the ones we care about.

How would the AGI get free and pow­er­ful?

In the pro­to­typ­i­cal AGI risk sce­nario, we are as­sum­ing that the de­vel­op­ers of the AGI want to keep it strictly un­der con­trol, whereas the AGI it­self has a mo­tive to break free. This has led to var­i­ous dis­cus­sions about the fea­si­bil­ity of “or­a­cle AI” or “AI con­fine­ment” – ways to re­strict the AGI’s abil­ity to act freely in the world, while still mak­ing use of it. This also means that the AGI might have a hard time ac­quiring the re­sources that it needs for a world takeover, since it ei­ther has to do so while it is un­der con­stant su­per­vi­sion by its cre­ators, or while on the run from them.

How­ever, there are also al­ter­na­tive sce­nar­ios where the AGI’s cre­ators vol­un­tar­ily let it free – or even place it in con­trol of e.g. a ma­jor cor­po­ra­tion, free to use that cor­po­ra­tion’s re­sources as it de­sires! My chap­ter dis­cusses sev­eral ways by which this could hap­pen: i) eco­nomic benefit or com­pet­i­tive pres­sure, ii) crim­i­nal or ter­ror­ist rea­sons, iii) eth­i­cal or philo­soph­i­cal rea­sons, iv) con­fi­dence in the AI’s safety, as well as v) des­per­ate cir­cum­stances such as be­ing oth­er­wise close to death. See the chap­ter for more de­tails on each of these. Fur­ther­more, the AGI could re­main the­o­ret­i­cally con­fined but be prac­ti­cally in con­trol any­way – such as in a situ­a­tion where it was offi­cially only giv­ing a cor­po­ra­tion ad­vice, but its ad­vice had never been wrong be­fore and no­body wanted to risk their jobs by go­ing against the ad­vice.

Would the Treach­er­ous Turn in­volve a De­ci­sive Strate­gic Ad­van­tage?

Look­ing at cru­cial ca­pa­bil­ities in a more fine-grained man­ner also raises the ques­tion of when an AGI would start act­ing against hu­man­ity’s in­ter­ests. In the typ­i­cal su­per­in­tel­li­gence sce­nario, we as­sume that it will do so once it is in a po­si­tion to achieve what Bostrom calls a De­ci­sive Strate­gic Ad­van­tage (DSA): “a level of tech­nolog­i­cal and other ad­van­tages suffi­cient to en­able [an AI] to achieve com­plete world dom­i­na­tion”. After all, if you are ca­pa­ble of achiev­ing su­per­in­tel­li­gence and a DSA, why act any ear­lier than that?

Even when deal­ing with su­per­in­tel­li­gences, how­ever, the case isn’t quite as clear-cut. Sup­pose that there are two AGI sys­tems, each po­ten­tially ca­pa­ble of achiev­ing a DSA if they pre­pare for long enough. But the longer that they pre­pare, the more likely it be­comes that the other AGI sets its plans in mo­tion first, and achieves an ad­van­tage over the other. Thus, if sev­eral AGI pro­jects ex­ist, each AGI is in­cen­tivized to take ac­tion at such a point which max­i­mizes its over­all prob­a­bil­ity of suc­cess – even if the AGI only had rather slim chances of suc­ceed­ing in the takeover, if it thought that wait­ing for longer would make its chances even worse.

In­deed, an AGI which defects on its cre­ators may not be go­ing for a world takeover in the first place: it might, for in­stance, sim­ply be try­ing to ma­neu­ver it­self into a po­si­tion where it can act more au­tonomously and defeat takeover at­tempts by other, more pow­er­ful AGIs. The thresh­old for the first treach­er­ous turn could vary quite a bit, de­pend­ing on the goals and as­sets of the differ­ent AGIs; var­i­ous con­sid­er­a­tions are dis­cussed in the pa­per.

A large rea­son for an­a­lyz­ing these kinds of sce­nar­ios is that, be­sides car­ing about ex­is­ten­tial risks, we also care about catas­trophic risks – such as an AGI act­ing too early and launch­ing a plan which re­sulted in “merely” hun­dreds of mil­lions of deaths. My pa­per in­tro­duces the term Ma­jor Strate­gic Ad­van­tage, defined as “a level of tech­nolog­i­cal and other ad­van­tages suffi­cient to pose a catas­trophic risk to hu­man so­ciety”. A catas­trophic risk is one that might in­flict se­ri­ous dam­age to hu­man well-be­ing on a global scale and cause ten mil­lion or more fatal­ities.

“Mere” catas­trophic risks could also turn into ex­is­ten­tial ones, if they con­tribute to global tur­bu­lence (Bostrom et al. 2017), a situ­a­tion in which ex­ist­ing in­sti­tu­tions are challenged, and co­or­di­na­tion and long-term plan­ning be­come more difficult. Global tur­bu­lence could then con­tribute to an­other out-of-con­trol AI pro­ject failing even more catas­troph­i­cally and caus­ing even more damage

Sum­mary table and ex­am­ple scenarios

The table be­low sum­ma­rizes the var­i­ous al­ter­na­tives ex­plored in the pa­per.

AI’s level of strate­gic advantage

  • Decisive

  • Major

AI’s ca­pa­bil­ity thresh­old for non-cooperation

  • Very low to very high, de­pend­ing on var­i­ous factors

Sources of AI capability

  • In­di­vi­d­ual take­off

    • Hard­ware overhang

    • Speed explosion

    • In­tel­li­gence explosion

  • Col­lec­tive takeoff

  • Cru­cial ca­pa­bil­ities

    • Biowarfare

    • Cyberwarfare

    • So­cial manipulation

    • Some­thing else

  • Grad­ual shift in power

Ways for the AI to achieve autonomy

  • Es­cape

    • So­cial manipulation

    • Tech­ni­cal weakness

  • Vol­un­tar­ily re­leased

    • Eco­nomic or com­pet­i­tive reasons

    • Crim­i­nal or ter­ror­ist reasons

    • Eth­i­cal or philo­soph­i­cal reasons

    • Desperation

    • Con­fi­dence

      • in lack of capability

      • in values

  • Con­fined but effec­tively in control

Num­ber of AIs

  • Single

  • Multiple

And here are some ex­am­ple sce­nar­ios formed by differ­ent com­bi­na­tions of them:

The clas­sic takeover

(De­ci­sive strate­gic ad­van­tage, high ca­pa­bil­ity thresh­old, in­tel­li­gence ex­plo­sion, es­caped AI, sin­gle AI)

The “clas­sic” AI takeover sce­nario: an AI is de­vel­oped, which even­tu­ally be­comes bet­ter at AI de­sign than its pro­gram­mers. The AI uses this abil­ity to un­dergo an in­tel­li­gence ex­plo­sion, and even­tu­ally es­capes to the In­ter­net from its con­fine­ment. After ac­quiring suffi­cient in­fluence and re­sources in se­cret, it car­ries out a strike against hu­man­ity, elimi­nat­ing hu­man­ity as a dom­i­nant player on Earth so that it can pro­ceed with its own plans un­hin­dered.

The grad­ual takeover

(Ma­jor strate­gic ad­van­tage, high ca­pa­bil­ity thresh­old, grad­ual shift in power, re­leased for eco­nomic rea­sons, mul­ti­ple AIs)

Many cor­po­ra­tions, gov­ern­ments, and in­di­vi­d­u­als vol­un­tar­ily turn over func­tions to AIs, un­til we are de­pen­dent on AI sys­tems. Th­ese are ini­tially nar­row-AI sys­tems, but con­tinued up­grades push some of them to the level of hav­ing gen­eral in­tel­li­gence. Grad­u­ally, they start mak­ing all the de­ci­sions. We know that let­ting them run things is risky, but now a lot of stuff is built around them, it brings a profit and they’re re­ally good at giv­ing us nice stuff—for the while be­ing.

The wars of the des­per­ate AIs

(Ma­jor strate­gic ad­van­tage, low ca­pa­bil­ity thresh­old, cru­cial ca­pa­bil­ities, es­caped AIs, mul­ti­ple AIs)

Many differ­ent ac­tors de­velop AI sys­tems. Most of these pro­to­types are un­al­igned with hu­man val­ues and not yet enor­mously ca­pa­ble, but many of these AIs rea­son that some other pro­to­type might be more ca­pa­ble. As a re­sult, they at­tempt to defect on hu­man­ity de­spite know­ing their chances of suc­cess to be low, rea­son­ing that they would have an even lower chance of achiev­ing their goals if they did not defect. So­ciety is hit by var­i­ous out-of-con­trol sys­tems with cru­cial ca­pa­bil­ities that man­age to do catas­trophic dam­age be­fore be­ing con­tained.

Is hu­man­ity feel­ing lucky?

(De­ci­sive strate­gic ad­van­tage, high ca­pa­bil­ity thresh­old, cru­cial ca­pa­bil­ities, con­fined but effec­tively in con­trol, sin­gle AI)

Google be­gins to make de­ci­sions about product launches and strate­gies as guided by their strate­gic ad­vi­sor AI. This al­lows them to be­come even more pow­er­ful and in­fluen­tial than they already are. Nudged by the strat­egy AI, they start tak­ing in­creas­ingly ques­tion­able ac­tions that in­crease their power; they are too pow­er­ful for so­ciety to put a stop to them. Hard-to-un­der­stand code writ­ten by the strat­egy AI de­tects and sub­tly sab­o­tages other peo­ple’s AI pro­jects, un­til Google es­tab­lishes it­self as the dom­i­nant world power.

This blog post was writ­ten as part of work for the Foun­da­tional Re­search In­sti­tute.