What if AI doesn’t quite go FOOM?

Intro

This ar­ti­cle seeks to ex­plore pos­si­ble fu­tures in a world where ar­tifi­cial in­tel­li­gence turns out NOT to be able to quickly, re­cur­sively self-im­prove so as to in­fluence our world with ar­bi­trar­ily large strength and sub­tlety, i.e, “go FOOM.” Note that I am not ar­gu­ing that AI won’t FOOM. Eliezer has made sev­eral good ar­gu­ments for why AI prob­a­bly will FOOM, and I don’t nec­es­sar­ily dis­agree. I am sim­ply call­ing at­ten­tion to the non-zero prob­a­bil­ity that it won’t FOOM, and then ask­ing what we might do to pre­pare for a world in which it doesn’t.

Failure Modes

I can imag­ine three differ­ent ways in which AI could fail to FOOM in the next 100 years or so. Op­tion 1 is a “hu­man fail.” Op­tion 1 means we de­stroy our­selves or suc­cumb to some other ex­is­ten­tial risk be­fore the first FOOM-ca­pa­ble AI boots up. I would love to hear in the com­ments sec­tion about (a) which ex­is­ten­tial risks peo­ple think are most likely to se­ri­ously threaten us be­fore the ad­vent of AI, and (b) what, if any­thing, a hand­ful of peo­ple with mod­er­ate re­sources (i.e., peo­ple who hang around on Less Wrong) might do to effec­tively com­bat some of those risks.

Op­tion 2 is a “hard­ware fail.” Op­tion 2 means that Moore’s Law turns out to have an up­per bound; if physics doesn’t show enough com­plex­ity be­neath the level of quarks, or if quan­tum-sized par­ti­cles are so ir­re­deemably ran­dom as to be in­tractable for com­pu­ta­tional pur­poses, then it might not be pos­si­ble for even the most ad­vanced in­tel­li­gence to sig­nifi­cantly im­prove on the ba­sic hard­ware de­sign of the su­per­com­put­ers of, say, the year 2020. This would limit the com­put­ing power available per dol­lar, and so the level of com­put­ing power re­quired for a self-im­prov­ing AI might not be af­ford­able for gen­er­a­tions, if ever. Nick Bostrom has some in­ter­est­ing thoughts along these lines, ul­ti­mately guess­ing (as of 2008) that the odds of a su­per-in­tel­li­gence form­ing by 2033 was less than 50%.

Op­tion 3 is a “soft­ware fail.” Op­tion 3 means that *pro­gram­ming* effi­ciency turns out to have an up­per bound; if there are nat­u­ral in­for­ma­tion-the­o­ret­i­cal limits on how effi­ciently a set num­ber of op­er­a­tions can be used to perform an ar­bi­trary task, then it might not be pos­si­ble for even the most ad­vanced in­tel­li­gence to sig­nifi­cantly im­prove on its ba­sic soft­ware de­sign; the su­per­com­puter would be more than ‘smart’ enough to un­der­stand it­self and to re-write it­self, but there would sim­ply not *be* an al­ter­nate script for the source code that was ac­tu­ally more effec­tive.

Th­ese three op­tions are not nec­es­sar­ily ex­haus­tive; they are just the pos­si­bil­ities that have im­me­di­ately oc­curred to me, with some help from User: JoshuaZ.

“Su­per­in­tel­li­gent Enough” AI

An im­por­tant point to keep in mind is that even if self-im­prov­ing AI faces hard limits be­fore be­com­ing ar­bi­trar­ily pow­er­ful, AI might still be more than pow­er­ful enough to effortlessly dom­i­nate fu­ture so­ciety. I am sure my num­bers are off by many or­ders of mag­ni­tude, but by way of illus­tra­tion only, sup­pose that cur­rent su­per­com­put­ers run at a speed of roughly 10^20 ops/​sec­ond, and that suc­cess­fully com­plet­ing Eliezer’s co­her­ent ex­trap­o­lated vo­li­tion pro­ject would re­quire a pro­cess­ing speed of roughly 10^36 ops/​sec­ond. There is ob­vi­ously quite a lot of space here for a mi­ni­a­ture FOOM. If one of to­day’s su­per­com­put­ers starts to go FOOM and then hits hard limits at 10^25 ops/​sec­ond, it wouldn’t be able to iden­tify hu­mankind’s CEV, but it might be able to, e.g, take over ev­ery elec­tronic de­vice ca­pa­ble of re­ceiv­ing trans­mis­sions, such as cars, satel­lites, and first-world fac­to­ries. If this hap­pens around the year 2020, a mini-FOOMed AI might also be able to take over homes, med­i­cal pros­thet­ics, robotic sol­diers, and credit cards.

Suffi­cient in­vest­ments in se­cu­rity and en­cryp­tion might keep such an AI out of some cor­ners of our econ­omy, but right now, ma­jor op­er­at­ing sys­tems aren’t even proof against ca­sual hu­man trolls, let alone a ded­i­cated AI think­ing at faster-than-hu­man speeds. I do not un­der­stand en­cryp­tion well, and so it is pos­si­ble that some plau­si­ble level of in­vest­ment in com­puter se­cu­rity could, con­trary to my as­sump­tions, ac­tu­ally man­age to pro­tect hu­man con­trol over in­di­vi­d­ual com­put­ers for the fore­see­able fu­ture. Even if key in­dus­trial re­sources were ad­e­quately se­cured, though, a mod­er­ately su­per-in­tel­li­gent AI might be ca­pa­ble of mod­el­ing the poli­tics of cur­rent hu­man lead­ers well enough to ma­nipu­late them into steer­ing Earth onto a path of its choos­ing, as in Is­sac Asi­mov’s The Evitable Con­flict.

If enough su­per­in­tel­li­gences de­velop at close enough to the same mo­ment in time and have differ­ent enough val­ues, they might in the­ory reach some sort of equil­ibrium that does not in­volve any one of them tak­ing over the world. As Eliezer has ar­gued (scroll down to 2nd half of the linked page), though, the sta­bil­ity of a race be­tween in­tel­li­gent agents should mostly be ex­pected to *de­crease* as those agents swal­low their own in­tel­lec­tual and phys­i­cal sup­ply chains. If a su­per­com­puter can take over larger and larger chunks of the In­ter­net as it gets smarter and smarter, or if a su­per­com­puter can effec­tively con­trol what hap­pens in more and more fac­to­ries as it gets smarter and smarter, then there’s less and less rea­son to think that su­per­com­put­ing em­pires will “grow” at roughly the same pace—the first em­pire to grow to a given size is likely to grow faster than its ri­vals un­til it takes over the world. Note that this could hap­pen even if the AI is nowhere near smart enough to start muck­ing about with up­loaded “ems” or nanorepli­ca­tors. Even in a bor­ingly nor­mal near-fu­ture sce­nario, a com­puter with even mod­est self-im­prove­ment and self-ag­gran­dize­ment ca­pa­bil­ities might be able to take over the world. Imag­ine some­thing like the end­ing to David Brin’s Earth, stripped of the mys­ti­cal sym­bol­ism and the egal­i­tar­ian op­ti­mism.

En­sur­ing a “Nice Place to Live”

I don’t know what Eliezer’s timeline is for at­tempt­ing to de­velop prov­ably Friendly AI, but it might be worth­while to at­tempt to de­velop a sec­ond-or­der stop­gap. Eliezer’s CEV is sup­posed to func­tion as a first-or­der stop­gap; it won’t achieve all of our goals, but it will en­sure that we all get to grow up in a Nice Place to Live while we figure out what those goals are. Of course, that only hap­pens if some­one de­vel­ops a CEV-ca­pa­ble AI. Eliezer seems quite wor­ried about the pos­si­bil­ity that some­one will de­velop a FOOMing unFriendly AI be­fore Friendly AI can get off the ground, but is any­thing be­ing done about this be­sides just rush­ing to finish Friendly AI?

Per­haps we need some kind of mini-FOOMing marginally Friendly AI whose only goal is to en­sure that noth­ing seizes con­trol of the world’s com­put­ing re­sources un­til SIAI can figure out how to get CEV to work. Although no “util­ity func­tion” can be speci­fied for a gen­eral AI with­out risk­ing pa­per-clip tiling, it might be pos­si­ble to for­mu­late a “home­o­static func­tion” at rel­a­tively low risk. An AI that “val­ued” keep­ing the world look­ing roughly the way it does now, that was speci­fi­cally in­structed *never* to seize con­trol of more than X num­ber of each of sev­eral thou­sand differ­ent kinds of re­sources, and whose prin­ci­pal in­tended ac­tivity was to search for, hunt down, and de­stroy AIs that seemed to be grow­ing too pow­er­ful too quickly might be an ac­cept­able risk. Even if such a “shield AI” were not prov­ably friendly, it might pose a smaller risk of tiling the so­lar sys­tem than the sta­tus quo, since the sta­tus quo is full of ir­re­spon­si­ble peo­ple who like to tin­ker with seed AIs.

An in­ter­est­ing side ques­tion is whether this would be coun­ter­pro­duc­tive in a world where Failure Mode 2 (hard limits on hard­ware) or Failure Mode 3 (hard limits on soft­ware) were se­ri­ous con­cerns. As­sum­ing that, even­tu­ally, a prov­ably friendly AI can be de­vel­oped, then, sev­eral years af­ter that, it’s likely that mil­lions of peo­ple can be con­vinced that it would be re­ally good to ac­ti­vate the prov­ably friendly AI, and hu­mans might be able to ded­i­cate enough re­sources to speci­fi­cally over­come the sec­ond-or­der stop­gap “shield AI” that was knock­ing out other peo­ple’s un-prov­ably Friendly AIs. But if the shield AI worked too well and got too close to the hard up­per bound on the power of an AI, then it might not be pos­si­ble to un­make the shield, even with added re­sources and with no holds barred.