What I Think, If Not Why

Re­ply to: Two Vi­sions Of Heritage

Though it re­ally goes tremen­dously against my grain—it feels like stick­ing my neck out over a cliff (or some­thing) - I guess I have no choice here but to try and make a list of just my po­si­tions, with­out jus­tify­ing them. We can only talk jus­tifi­ca­tion, I guess, af­ter we get straight what my po­si­tions are. I will also leave off many dis­claimers to pre­sent the points com­pactly enough to be re­mem­bered.

• A well-de­signed mind should be much more effi­cient than a hu­man, ca­pa­ble of do­ing more with less sen­sory data and fewer com­put­ing op­er­a­tions. It is not in­finitely effi­cient and does not use zero data. But it does use lit­tle enough that lo­cal pipelines such as a small pool of pro­gram­mer-teach­ers and, later, a huge pool of e-data, are suffi­cient.

• An AI that reaches a cer­tain point in its own de­vel­op­ment be­comes able to (sus­tain­ably, strongly) im­prove it­self. At this point, re­cur­sive cas­cades slam over many in­ter­nal growth curves to near the limits of their cur­rent hard­ware, and the AI un­der­goes a vast in­crease in ca­pa­bil­ity. This point is at, or prob­a­bly con­sid­er­ably be­fore, a min­i­mally tran­shu­man mind ca­pa­ble of writ­ing its own AI-the­ory text­books—an up­per bound be­yond which it could swal­low and im­prove its en­tire de­sign chain.

• It is likely that this ca­pa­bil­ity in­crease or “FOOM” has an in­trin­sic max­i­mum ve­loc­ity that a hu­man would re­gard as “fast” if it hap­pens at all. A hu­man week is ~1e15 se­rial op­er­a­tions for a pop­u­la­tion of 2GHz cores, and a cen­tury is ~1e19 se­rial op­er­a­tions; this whole range is a nar­row win­dow. How­ever, the core ar­gu­ment does not re­quire one-week speed and a FOOM that takes two years (~1e17 se­rial ops) will still carry the weight of the ar­gu­ment.

The de­fault case of FOOM is an unFriendly AI, built by re­searchers with shal­low in­sights. This AI be­comes able to im­prove it­self in a hap­haz­ard way, makes var­i­ous changes that are net im­prove­ments but may in­tro­duce value drift, and then gets smart enough to do guaran­teed self-im­prove­ment, at which point its val­ues freeze (for­ever).

The de­sired case of FOOM is a Friendly AI, built us­ing deep in­sight, so that the AI never makes any changes to it­self that po­ten­tially change its in­ter­nal val­ues; all such changes are guaran­teed us­ing strong tech­niques that al­low for a billion se­quen­tial self-mod­ifi­ca­tions with­out los­ing the guaran­tee. The guaran­tee is writ­ten over the AI’s in­ter­nal search crite­rion for ac­tions, rather than ex­ter­nal con­se­quences.

• The good guys do not write an AI which val­ues a bag of things that the pro­gram­mers think are good ideas, like liber­tar­i­anism or so­cial­ism or mak­ing peo­ple happy or what­ever. There were mul­ti­ple Over­com­ing Bias se­quences about this one point, like the Fake Utility Func­tion se­quence and the se­quence on metaethics. It is dealt with at length in the doc­u­ment Co­her­ent *Ex­trap­o­lated* Vo­li­tion. It is the first thing, the last thing, and the mid­dle thing that I say about Friendly AI. I have said it over and over. I truly do not un­der­stand how any­one can pay any at­ten­tion to any­thing I have said on this sub­ject, and come away with the im­pres­sion that I think pro­gram­mers are sup­posed to di­rectly im­press their non-meta per­sonal philoso­phies onto a Friendly AI.

The good guys do not di­rectly im­press their per­sonal val­ues onto a Friendly AI.

• Ac­tu­ally set­ting up a Friendly AI’s val­ues is an ex­tremely meta op­er­a­tion, less “make the AI want to make peo­ple happy” and more like “su­per­pose the pos­si­ble re­flec­tive equil­ibria of the whole hu­man species, and out­put new code that over­writes the cur­rent AI and has the most co­her­ent sup­port within that su­per­po­si­tion”. This ac­tu­ally seems to be some­thing of a Pons As­ino­rum in FAI—the abil­ity to un­der­stand and en­dorse metaeth­i­cal con­cepts that do not di­rectly sound like amaz­ing won­der­ful happy ideas. De­scribing this as declar­ing to­tal war on the rest of hu­man­ity, does not seem fair (or ac­cu­rate).

I my­self am strongly in­di­vi­d­u­al­is­tic: The most painful mem­o­ries in my life have been when other peo­ple thought they knew bet­ter than me, and tried to do things on my be­half. It is also a known prin­ci­ple of he­do­nic psy­chol­ogy that peo­ple are hap­pier when they’re steer­ing their own lives and do­ing their own in­ter­est­ing work. When I try my­self to vi­su­al­ize what a benefi­cial su­per­in­tel­li­gence ought to do, it con­sists of set­ting up a world that works by bet­ter rules, and then fad­ing into the back­ground, silent as the laws of Na­ture once were; and fi­nally fold­ing up and van­ish­ing when it is no longer needed. But this is only the thought of my mind that is merely hu­man, and I am barred from pro­gram­ming any such con­sid­er­a­tion di­rectly into a Friendly AI, for the rea­sons given above.

• Nonethe­less, it does seem to me that this par­tic­u­lar sce­nario could not be justly de­scribed as “a God to rule over us all”, un­less the cur­rent fact that hu­mans age and die is “a malev­olent God to rule us all”. So ei­ther Robin has a very differ­ent idea about what hu­man re­flec­tive equil­ibrium val­ues are likely to look like; or Robin be­lieves that the Friendly AI pro­ject is bound to fail in such way as to cre­ate a pa­ter­nal­is­tic God; or—and this seems more likely to me—Robin didn’t read all the way through all the blog posts in which I tried to ex­plain all the ways that this is not how Friendly AI works.

Friendly AI is tech­ni­cally difficult and re­quires an ex­tra-or­di­nary effort on mul­ti­ple lev­els. English sen­tences like “make peo­ple happy” can­not de­scribe the val­ues of a Friendly AI. Test­ing is not suffi­cient to guaran­tee that val­ues have been suc­cess­fully trans­mit­ted.

• White-hat AI re­searchers are dis­t­in­guished by the de­gree to which they un­der­stand that a sin­gle mis­step could be fatal, and can dis­crim­i­nate strong and weak as­surances. Good in­ten­tions are not only com­mon, they’re cheap. The story isn’t about good ver­sus evil, it’s about peo­ple try­ing to do the im­pos­si­ble ver­sus oth­ers who… aren’t.

• In­tel­li­gence is about be­ing able to learn lots of things, not about know­ing lots of things. In­tel­li­gence is es­pe­cially not about tape-record­ing lots of parsed English sen­tences a la Cyc. Old AI work was poorly fo­cused due to in­abil­ity to in­tro­spec­tively see the first and higher deriva­tives of knowl­edge; hu­man be­ings have an eas­ier time recit­ing sen­tences than recit­ing their abil­ity to learn.

In­tel­li­gence is mostly about ar­chi­tec­ture, or “knowl­edge” along the lines of know­ing to look for causal struc­ture (Bayes-net type stuff) in the en­vi­ron­ment; this kind of knowl­edge will usu­ally be ex­pressed pro­ce­du­rally as well as declar­a­tively. Ar­chi­tec­ture is mostly about deep in­sights. This point has not yet been ad­dressed (much) on Over­com­ing Bias, but Bayes nets can be con­sid­ered as an archety­pal ex­am­ple of “ar­chi­tec­ture” and “deep in­sight”. Also, ask your­self how lawful in­tel­li­gence seemed to you be­fore you started read­ing this blog, how lawful it seems to you now, then ex­trap­o­late out­ward from that.