The Friendly AI Game

At the re­cent Lon­don meet-up some­one (I’m afraid I can’t re­mem­ber who) sug­gested that one might be able to solve the Friendly AI prob­lem by build­ing an AI whose con­cerns are limited to some small ge­o­graph­i­cal area, and which doesn’t give two hoots about what hap­pens out­side that area. Ciper­goth pointed out that this would prob­a­bly re­sult in the AI con­vert­ing the rest of the uni­verse into a fac­tory to make its small area more awe­some. In the pro­cess, he men­tioned that you can make a “fun game” out of figur­ing out ways in which pro­posed util­ity func­tions for Friendly AIs can go hor­ribly wrong. I pro­pose that we play.

Here’s the game: re­ply to this post with pro­posed util­ity func­tions, stated as for­mally or, at least, as ac­cu­rately as you can man­age; fol­low-up com­ments ex­plain why a su­per-hu­man in­tel­li­gence built with that par­tic­u­lar util­ity func­tion would do things that turn out to be hideously un­de­sir­able.

There are three rea­sons I sug­gest play­ing this game. In de­scend­ing or­der of im­por­tance, they are:

  1. It sounds like fun

  2. It might help to con­vince peo­ple that the Friendly AI prob­lem is hard(*).

  3. We might ac­tu­ally come up with some­thing that’s bet­ter than any­thing any­one’s thought of be­fore, or some­thing where the proof of Friendli­ness is within grasp—the solu­tions to difficult math­e­mat­i­cal prob­lems of­ten look ob­vi­ous in hind­sight, and it surely can’t hurt to try

DISCLAIMER (prob­a­bly un­nec­es­sary, given the au­di­ence) - I think it is un­likely that any­one will man­age to come up with a for­mally stated util­ity func­tion for which none of us can figure out a way in which it could go hideously wrong. How­ever, if they do so, this does NOT con­sti­tute a proof of Friendli­ness and I 100% do not en­dorse any at­tempt to im­ple­ment an AI with said util­ity func­tion.
(*) I’m slightly wor­ried that it might have the op­po­site effect, as peo­ple build more and more com­pli­cated con­junc­tions of de­sires to over­come the ob­jec­tions that we’ve already seen, and start to think the prob­lem comes down to noth­ing more than writ­ing a long list of spe­cial cases but, on bal­ance, I think that’s likely to have less of an effect than just see­ing how naive sug­ges­tions for Friendli­ness can be hideously bro­ken.