AI risk, new executive summary

AI risk

Bul­let points

  • By all in­di­ca­tions, an Ar­tifi­cial In­tel­li­gence could some­day ex­ceed hu­man in­tel­li­gence.

  • Such an AI would likely be­come ex­tremely in­tel­li­gent, and thus ex­tremely pow­er­ful.

  • Most AI mo­ti­va­tions and goals be­come dan­ger­ous when the AI be­comes pow­er­ful.

  • It is very challeng­ing to pro­gram an AI with fully safe goals, and an in­tel­li­gent AI would likely not in­ter­pret am­bigu­ous goals in a safe way.

  • A dan­ger­ous AI would be mo­ti­vated to seem safe in any con­trol­led train­ing set­ting.

  • Not enough effort is cur­rently be­ing put into de­sign­ing safe AIs.

Ex­ec­u­tive summary

The risks from ar­tifi­cial in­tel­li­gence (AI) in no way re­sem­ble the pop­u­lar image of the Ter­mi­na­tor. That fic­tional me­chan­i­cal mon­ster is dis­t­in­guished by many fea­tures – strength, ar­mour, im­pla­ca­bil­ity, in­de­struc­tabil­ity – but ex­treme in­tel­li­gence isn’t one of them. And it is pre­cisely ex­treme in­tel­li­gence that would give an AI its power, and hence make it dan­ger­ous.

The hu­man brain is not much big­ger than that of a chim­panzee. And yet those ex­tra neu­rons ac­count for the differ­ence of out­comes be­tween the two species: be­tween a pop­u­la­tion of a few hun­dred thou­sand and ba­sic wooden tools, ver­sus a pop­u­la­tion of sev­eral billion and heavy in­dus­try. The hu­man brain has al­lowed us to spread across the sur­face of the world, land on the moon, de­velop nu­clear weapons, and co­or­di­nate to form effec­tive groups with mil­lions of mem­bers. It has granted us such power over the nat­u­ral world that the sur­vival of many other species is no longer de­ter­mined by their own efforts, but by preser­va­tion de­ci­sions made by hu­mans.

In the last sixty years, hu­man in­tel­li­gence has been fur­ther aug­mented by au­toma­tion: by com­put­ers and pro­grammes of steadily in­creas­ing abil­ity. Th­ese have taken over tasks formerly performed by the hu­man brain, from mul­ti­pli­ca­tion through weather mod­el­ling to driv­ing cars. The pow­ers and abil­ities of our species have in­creased steadily as com­put­ers have ex­tended our in­tel­li­gence in this way. There are great un­cer­tain­ties over the timeline, but fu­ture AIs could reach hu­man in­tel­li­gence and be­yond. If so, should we ex­pect their power to fol­low the same trend? When the AI’s in­tel­li­gence is as be­yond us as we are be­yond chim­panzees, would it dom­i­nate us as thor­oughly as we dom­i­nate the great apes?

There are more di­rect rea­sons to sus­pect that a true AI would be both smart and pow­er­ful. When com­put­ers gain the abil­ity to perform tasks at the hu­man level, they tend to very quickly be­come much bet­ter than us. No-one to­day would think it sen­si­ble to pit the best hu­man mind again a cheap pocket calcu­la­tor in a con­test of long di­vi­sion. Hu­man ver­sus com­puter chess matches ceased to be in­ter­est­ing a decade ago. Com­put­ers bring re­lentless fo­cus, pa­tience, pro­cess­ing speed, and mem­ory: once their soft­ware be­comes ad­vanced enough to com­pete equally with hu­mans, these fea­tures of­ten en­sure that they swiftly be­come much bet­ter than any hu­man, with in­creas­ing com­puter power fur­ther widen­ing the gap.

The AI could also make use of its unique, non-hu­man ar­chi­tec­ture. If it ex­isted as pure soft­ware, it could copy it­self many times, train­ing each copy at ac­cel­er­ated com­puter speed, and net­work those copies to­gether (cre­at­ing a kind of “su­per-com­mit­tee” of the AI equiv­a­lents of, say, Edi­son, Bill Clin­ton, Plato, Ein­stein, Cae­sar, Spielberg, Ford, Steve Jobs, Bud­dha, Napoleon and other hu­mans su­per­la­tive in their re­spec­tive skill-sets). It could con­tinue copy­ing it­self with­out limit, cre­at­ing mil­lions or billions of copies, if it needed large num­bers of brains to brute-force a solu­tion to any par­tic­u­lar prob­lem.

Our so­ciety is setup to mag­nify the po­ten­tial of such an en­tity, pro­vid­ing many routes to great power. If it could pre­dict the stock mar­ket effi­ciently, it could ac­cu­mu­late vast wealth. If it was effi­cient at ad­vice and so­cial ma­nipu­la­tion, it could cre­ate a per­sonal as­sis­tant for ev­ery hu­man be­ing, ma­nipu­lat­ing the planet one hu­man at a time. It could also re­place al­most ev­ery worker in the ser­vice sec­tor. If it was effi­cient at run­ning economies, it could offer its ser­vices do­ing so, grad­u­ally mak­ing us com­pletely de­pen­dent on it. If it was skil­led at hack­ing, it could take over most of the world’s com­put­ers and copy it­self into them, us­ing them to con­tinue fur­ther hack­ing and com­puter takeover (and, in­ci­den­tally, mak­ing it­self al­most im­pos­si­ble to de­stroy). The paths from AI in­tel­li­gence to great AI power are many and varied, and it isn’t hard to imag­ine new ones.

Of course, sim­ply be­cause an AI could be ex­tremely pow­er­ful, does not mean that it need be dan­ger­ous: its goals need not be nega­tive. But most goals be­come dan­ger­ous when an AI be­comes pow­er­ful. Con­sider a spam filter that be­came in­tel­li­gent. Its task is to cut down on the num­ber of spam mes­sages that peo­ple re­ceive. With great power, one solu­tion to this re­quire­ment is to ar­range to have all spam­mers kil­led. Or to shut down the in­ter­net. Or to have ev­ery­one kil­led. Or imag­ine an AI ded­i­cated to in­creas­ing hu­man hap­piness, as mea­sured by the re­sults of sur­veys, or by some bio­chem­i­cal marker in their brain. The most effi­cient way of do­ing this is to pub­li­cly ex­e­cute any­one who marks them­selves as un­happy on their sur­vey, or to forcibly in­ject ev­ery­one with that bio­chem­i­cal marker.

This is a gen­eral fea­ture of AI mo­ti­va­tions: goals that seem safe for a weak or con­trol­led AI, can lead to ex­tremely patholog­i­cal be­havi­our if the AI be­comes pow­er­ful. As the AI gains in power, it be­comes more and more im­por­tant that its goals be fully com­pat­i­ble with hu­man flour­ish­ing, or the AI could en­act a patholog­i­cal solu­tion rather than one that we in­tended. Hu­mans don’t ex­pect this kind of be­havi­our, be­cause our goals in­clude a lot of im­plicit in­for­ma­tion, and we take “filter out the spam” to in­clude “and don’t kill ev­ery­one in the world”, with­out hav­ing to ar­tic­u­late it. But the AI might be an ex­tremely alien mind: we can­not an­thro­po­mor­phise it, or ex­pect it to in­ter­pret things the way we would. We have to ar­tic­u­late all the im­plicit limi­ta­tions. Which may mean com­ing up with a solu­tion to, say, hu­man value and flour­ish­ing – a task philoso­phers have been failing at for mil­len­nia – and cast it un­am­bigu­ously and with­out er­ror into com­puter code.

Note that the AI may have a perfect un­der­stand­ing that when we pro­grammed in “filter out the spam”, we im­plic­itly meant “don’t kill ev­ery­one in the world”. But the AI has no mo­ti­va­tion to go along with the spirit of the law: its goals are the let­ter only, the bit we ac­tu­ally pro­grammed into it. Another wor­ry­ing fea­ture is that the AI would be mo­ti­vated to hide its patholog­i­cal ten­den­cies as long as it is weak, and as­sure us that all was well, through any­thing it says or does. This is be­cause it will never be able to achieve its goals if it is turned off, so it must lie and play nice to get any­where. Only when we can no longer con­trol it, would it be will­ing to act openly on its true goals – we can but hope these turn out safe.

It is not cer­tain that AIs could be­come so pow­er­ful, nor is it cer­tain that a pow­er­ful AI would be­come dan­ger­ous. Nev­er­the­less, the prob­a­bil­ities of both are high enough that the risk can­not be dis­missed. The main fo­cus of AI re­search to­day is cre­at­ing an AI; much more work needs to be done on cre­at­ing it safely. Some are already work­ing on this prob­lem (such as the Fu­ture of Hu­man­ity In­sti­tute and the Ma­chine In­tel­li­gence Re­search In­sti­tute), but a lot re­mains to be done, both at the de­sign and at the policy level.