Complexity of Value ≠ Complexity of Outcome

Com­plex­ity of value is the the­sis that our prefer­ences, the things we care about, don’t com­press down to one sim­ple rule, or a few sim­ple rules. To re­view why it’s im­por­tant (by quot­ing from the wiki):

  • Car­i­ca­tures of ra­tio­nal­ists of­ten have them moved by ar­tifi­cially sim­plified val­ues—for ex­am­ple, only car­ing about per­sonal plea­sure. This be­comes a tem­plate for ar­gu­ing against ra­tio­nal­ity: X is valuable, but ra­tio­nal­ity says to only care about Y, in which case we could not value X, there­fore do not be ra­tio­nal.

  • Un­der­es­ti­mat­ing the com­plex­ity of value leads to un­der­es­ti­mat­ing the difficulty of Friendly AI; and there are no­table cog­ni­tive bi­ases and fal­la­cies which lead peo­ple to un­der­es­ti­mate this com­plex­ity.

I cer­tainly agree with both of these points. But I worry that we (at Less Wrong) might have swung a bit too far in the other di­rec­tion. No, I don’t think that we over­es­ti­mate the com­plex­ity of our val­ues, but rather there’s a ten­dency to as­sume that com­plex­ity of value must lead to com­plex­ity of out­come, that is, agents who faith­fully in­herit the full com­plex­ity of hu­man val­ues will nec­es­sar­ily cre­ate a fu­ture that re­flects that com­plex­ity. I will ar­gue that it is pos­si­ble for com­plex val­ues to lead to sim­ple fu­tures, and ex­plain the rele­vance of this pos­si­bil­ity to the pro­ject of Friendly AI.

The eas­iest way to make my ar­gu­ment is to start by con­sid­er­ing a hy­po­thet­i­cal alien with all of the val­ues of a typ­i­cal hu­man be­ing, but also an ex­tra one. His fon­d­est de­sire is to fill the uni­verse with or­gas­mium, which he con­sid­ers to have or­ders of mag­ni­tude more util­ity than re­al­iz­ing any of his other goals. As long as his dom­i­nant goal re­mains in­fea­si­ble, he’s largely in­dis­t­in­guish­able from a nor­mal hu­man be­ing. But if he hap­pens to pass his val­ues on to a su­per­in­tel­li­gent AI, the fu­ture of the uni­verse will turn out to be rather sim­ple, de­spite those val­ues be­ing no less com­plex than any hu­man’s.

The above pos­si­bil­ity is easy to rea­son about, but per­haps does not ap­pear very rele­vant to our ac­tual situ­a­tion. I think that it may be, and here’s why. All of us have many differ­ent val­ues that do not re­duce to each other, but most of those val­ues do not ap­pear to scale very well with available re­sources. In other words, among our man­i­fold de­sires, there may only be a few that are not eas­ily sa­ti­ated when we have ac­cess to the re­sources of an en­tire galaxy or uni­verse. If so, (and as­sum­ing we aren’t wiped out by an ex­is­ten­tial risk or fall into a Malthu­sian sce­nario) the fu­ture of our uni­verse will be shaped largely by those val­ues that do scale. (I should point out that in this case the uni­verse won’t nec­es­sar­ily turn out to be mostly sim­ple. Sim­ple val­ues do not nec­es­sar­ily lead to sim­ple out­comes ei­ther.)

Now if we were ra­tio­nal agents who had perfect knowl­edge of our own prefer­ences, then we would already know whether this is the case or not. And if it is, we ought to be able to vi­su­al­ize what the fu­ture of the uni­verse will look like, if we had the power to shape it ac­cord­ing to our de­sires. But I find my­self un­cer­tain on both ques­tions. Still, I think this pos­si­bil­ity is worth in­ves­ti­gat­ing fur­ther. If it were the case that only a few of our val­ues scale, then we can po­ten­tially ob­tain al­most all that we de­sire by cre­at­ing a su­per­in­tel­li­gence with just those val­ues. And per­haps this can be done man­u­ally, by­pass­ing an au­to­mated prefer­ence ex­trac­tion or ex­trap­o­la­tion pro­cess with their as­so­ci­ated difficul­ties and dan­gers. (To head off a po­ten­tial ob­jec­tion, this does as­sume that our val­ues in­ter­act in an ad­di­tive way. If there are val­ues that don’t scale but in­ter­act non­lin­early (mul­ti­plica­tively, for ex­am­ple) with val­ues that do scale, then those would need to be in­cluded as well.)

Whether or not we ac­tu­ally should take this ap­proach would de­pend on the out­come of such an in­ves­ti­ga­tion. Just how much of our de­sires can fea­si­bly be ob­tain this way? And how does the loss of value in­her­ent in this ap­proach com­pare with the ex­pected loss of value due to the po­ten­tial of er­rors in the ex­trac­tion/​ex­trap­o­la­tion pro­cess? Th­ese are ques­tions worth try­ing to an­swer be­fore com­mit­ting to any par­tic­u­lar path, I think.
P.S., I hes­i­tated a bit in post­ing this, be­cause un­der­es­ti­mat­ing the com­plex­ity of hu­man val­ues is ar­guably a greater dan­ger than over­look­ing the pos­si­bil­ity that I point out here, and this post could con­ceiv­ably be used by some­one to ra­tio­nal­ize stick­ing with their “One Great Mo­ral Prin­ci­ple”. But I guess those tempted to do so will tend not to be Less Wrong read­ers, and see­ing how I already got my­self sucked into this de­bate, I might as well clar­ify and ex­pand on my po­si­tion.