AI Reading Group Thoughts (1/​?): The Mandate of Heaven

My house­mate Kel­sey “the­unitof­car­ing” has be­gun host­ing an AI read­ing group in our house. Our first meet­ing was yes­ter­day evening, and over a first draft at­tempt at choco­late mac­arons, we dis­cussed this ar­ti­cle about AI safety and effi­ciency by Paul Chris­ti­ano, and var­i­ous ideas prompted thereby at greater or lesser re­move.

One idea that came up is what we de­cided to call “tip­ping point AI” (be­cause ap­par­ently there are a lot of com­pet­ing defi­ni­tions for “trans­for­ma­tive” AI). The defi­ni­tion we were us­ing for tip­ping point AI was “some­thing such that it, or its con­trol­ler, is ca­pa­ble of pre­vent­ing oth­ers from build­ing AIs”. The ex­act type and level of ca­pa­bil­ity here could vary—for in­stance, if it’s built af­ter we’ve colonized Mars (that is, colonized it to an ex­tent such that Mar­ti­ans could un­der­take pro­jects like build­ing AIs), then a tip­ping point AI has to be able to pro­ject power to Mars in some form, even if the only re­quired level of fi­nesse is lethal­ity. But if it’s be­fore we’ve colonized Mars, it can be un­able to do that, and just able to pre­vent coloniza­tion pro­jects in ad­di­tion to AI pro­jects.

One hy­poth­e­sis that has been floated in a con­text such that we are pretty sure it is not any­one’s real plan is that an AI could just de­stroy all the GPUs on the planet and pre­vent the man­u­fac­ture of new ones. This would be bad for Bit­coins, video games, and AI pro­jects, but oth­er­wise rel­a­tively low-im­pact. An AI might be able to ac­com­plish this task by co­er­cion, or even by proxy—the com­plete sys­tem of “the AI, and its con­trol­ler” needs to be able to pre­vent AI cre­ation by other agents, so the AI it­self might only need to iden­tify tar­gets for a con­trol­ler who already wields enough power to fire mis­siles or con­fis­cate hard­ware and chooses to do so in ser­vice of this goal, per­haps the US gov­ern­ment.

The idea be­hind cre­at­ing tip­ping point AI isn’t that this is where we stop for­ever. The tip­ping point AI only has to pre­vent other agents from build­ing their own in their base­ments. It elimi­nates com­pe­ti­tion. Some fea­tures of a situ­a­tion in which a tip­ping point AI ex­ists in­clude:

  • The agent con­trol­ling the AI can work on more so­phis­ti­cated sec­ond drafts with­out wor­ry­ing about some­one else rush­ing to pro­duc­tion un­safely.

  • The con­trol­ling agent can pub­lish in­sights and seek feed­back with­out wor­ry­ing about pla­gia­rism, code forks, etc.

  • They can ap­ply the AI’s other abil­ities, if any (there will pre­sum­ably be some, since “pre­vent AI cre­ation” is not a prim­i­tive ac­tion—some surveillance ca­pa­bil­ity seems like a min­i­mum to me) to their other prob­lems, per­haps in­clud­ing cre­at­ing a bet­ter AI. Even if this ap­pli­ca­tion has eco­nomic or other benefits that might at­tract oth­ers to similar solu­tions by de­fault, the AI will pre­vent that, so no one will be (pro­duc­tively) star­tled or in­spired into work­ing on AI faster by see­ing the re­sults.

How­ever, if you’re an agent con­trol­ling a tip­ping point AI, you have a prob­lem: the bus num­ber* of the hu­man race has sud­denly dropped to “you and your co­hort”. If any­thing hap­pens to you—and an AI be­ing tip­ping point va­ri­ety doesn’t im­ply it can help you with all of the things that might hap­pen to you—then the AI is lead­er­less. This, de­pend­ing on its con­struc­tion, might mean it goes rogue and does some­thing weird, that it goes dor­mant and there’s no pro­tec­tion against a poorly built new AI pro­ject, or that it keeps do­ing what­ever its last di­rec­tive was (in the ex­am­ple un­der dis­cus­sion, “pre­vent any­one from build­ing an­other AI”). None of these are good states to have ob­tain per­ma­nently.

So you might want to define, and then ar­chi­tect into your AI the defi­ni­tion of, or­ga­ni­za­tional con­ti­nu­ity, ro­bustly enough that none of those things will hap­pen.

This isn’t triv­ial—it’s al­most cer­tainly eas­ier than defin­ing hu­man value in gen­eral, but that doesn’t mean it’s sim­ple. Your defi­ni­tion has to han­dle in­ter­nal schisms, both overt and sub­tle, rang­ing from “the IT guy we fired is work­ing for would-be ri­vals” to “there’s se­ri­ous dis­agree­ment among our re­searchers about whether to go ahead with Pro­ject Tu­raco, and Frances and Harold are work­ing on a Tu­raco fork in their garage”. If you don’t want the wrong bus ac­ci­dent (or as­sas­si­na­tion) to mean that hu­man­ity ends, en­coun­ters a hard stop in its tech­nolog­i­cal progress, or has its panop­ti­conic med­dling in­tel­li­gence in­her­ited by a ran­dom per­son who chose the same name for their uber-for-spirulina busi­ness? Then you need to have a way to pass on the man­date of heaven.

One idea that popped into my head while I was turn­ing over this prob­lem was a code of or­ga­ni­za­tional con­duct. This al­lows the or­ga­ni­za­tion to re­sume af­ter a dis­con­ti­nu­ity, with­out grant­ing ran­dom peo­ple a first-mover ad­van­tage at pick­ing up the dropped man­tle un­less they take it up whole. It’s still a sim­pler prob­lem than hu­man value in gen­eral, but it’s in­ter­me­di­ate be­tween that and “define mem­bers of a con­ven­tional con­tin­u­ous group of hu­mans”. The code has to be some­thing that in­cludes its own de­ci­sion­mak­ing pro­cess—if six peo­ple across the globe adopt a code si­mul­ta­neously they’ll need to re­solve con­flicts be­tween them just as much as the origi­nal or­ga­ni­za­tion did. You pre­sum­ably want to in­cor­po­rate se­cu­rity fea­tures that pro­tect both against garage forks of Pro­jects Tu­raco and also against ill-in­ten­tioned or not-too-bright in­her­i­tors of your code.

Other op­tions in­clude:

  • Con­ven­tional or­ga­ni­za­tional con­ti­nu­ity. You have, per­haps, a board of di­rec­tors who never share a ve­hi­cle, and they have some sort of in­put into the ex­ec­u­tives of the or­ga­ni­za­tion, and you hope no­body brings the plague to work, and there is some sort of pro­cess ac­cord­ing to which de­ci­sions are made and some sort of pro­cess for de­fault­ing if de­ci­sions fail to be made.

  • Des­ig­nated or­ga­ni­za­tional heirs: if your con­ven­tional or­ga­ni­za­tion fails, then your sister pro­ject, who are lay­ing the­o­ret­i­cal ground­work but not build­ing any­thing yet be­cause you have a tip­ping point AI and you said so, get the man­date of heaven and can pro­ceed. This as­sumes that you think their chances of achiev­ing value al­ign­ment are worse than yours but bet­ter than any other op­tion. This has ob­vi­ous in­cen­tive prob­lems with re­spect to the other or­ga­ni­za­tion’s in­ter­est in yours sud­denly ceas­ing to ex­ist.

  • Non-or­ga­ni­za­tion based strate­gies (a line of suc­ces­sion of in­di­vi­d­u­als). Peo­ple be­ing change­able, this list would need to be care­fully cu­rated and care­fully main­tained by who­ever was as­cen­dant, and it would be at sub­stan­tial risk of un­ob­served de­cep­tion, er­rors in judg­ment, or evolu­tion over time of heirs’ in­ter­ests and ca­pa­bil­ities af­ter their pre­de­ces­sors can no longer edit the line of suc­ces­sion. Th­ese would all be ca­pa­ble of af­fect­ing the long term fu­ture of hu­man­ity once the AI changed hands.

  • I’m sure there are things I haven’t thought of.

I don’t have a con­clu­sion be­cause I just wrote this about thoughts that I had in re­sponse to the meet­ing, to let other peo­ple who can’t at­tend still be in on some of what we’re talk­ing and think­ing about.

*The num­ber of peo­ple who can be hit by a bus be­fore the or­ga­ni­za­tion ceases to function