Preferences over non-rewards

In this penul­ti­mate post on “learn­ing hu­man val­ues” se­ries, I just want to ad­dress some hu­man val­ues/​prefer­ences/​re­wards that don’t fit neatly into the (p, R) model where p in the plan­ning al­gorithm and R the ac­tual re­ward.

Prefer­ences over prefer­ences and knowledge

Most peo­ple have prefer­ences over their own prefer­ences—and that of oth­ers. For ex­am­ple, con­sider some­one who has an in­cor­rect re­li­gious faith. They might be­lieve some­thing like:

“I want to always con­tinue be­liev­ing. I flinch away from cer­tain scep­ti­cal ar­gu­ments, but I’m sure my de­ity would pro­tect me from doubt if I ever de­cided to look into them”.

Hope this doesn’t sound com­pletely im­plau­si­ble for some­one. Here they have be­liefs, prefer­ences over their fu­ture be­liefs, and be­liefs over their fu­ture be­liefs. This doesn’t seem to be able to be eas­ily cap­tured in the (p, R) frame­work. We can also see that ask­ing them equiv­a­lent ques­tions “Do you want to doubt your de­ity?” and “Do you want to learn the truth?” will get very differ­ent an­swers.

But it’s not just the­ism, an ex­am­ple which is too easy to pick on. I have prefer­ences over knowl­edge, for in­stance, as do most peo­ple. I would pre­fer that peo­ple had ac­cu­rate in­for­ma­tion, for in­stance. I would also pre­fer that, when choos­ing be­tween pos­si­ble for­mal­i­sa­tions of prefer­ences, peo­ple went with the less de­struc­tive and less self-de­struc­tive op­tions. Th­ese are not over­whelm­ingly strong prefer­ences, but they cer­tainly ex­ist.


Con­sider the fol­low­ing sce­nario: some­one be­lieves that rol­ler-coast­ers are perfectly safe, but en­joys rid­ing them for the feel­ing of dan­ger they give them. It’s clear that the challenge here is not rec­on­cil­ing the be­lief of safety with the alief of dan­ger (which is sim­ple: rol­ler-coast­ers are safe), but to some­how trans­form the feel­ing of dan­ger into an­other form that keeps the ini­tial en­joy­ment.

Trib­al­ism and signalling

The the­ism ar­gu­ment might sug­gest that trib­al­ism will be a ma­jor prob­lem, as var­i­ous groups pres­sure ad­her­ents to con­form to cer­tain be­liefs and prefer­ences.

But ac­tu­ally that need not be such a prob­lem. It’s clear that there is a strong de­sire to re­main part of that group (or, some­times, just of a group). Once that de­sire is iden­ti­fied, all the rest be­come in­stru­men­tal—the hu­man will ei­ther do the ac­tions that are needed to re­main part of the group, with­out need­ing to change their be­liefs or prefer­ence (just be­cause evolu­tion doesn’t al­low us to sep­a­rate those two eas­ily, doesn’t mean an AI can’t help us do it), or will ra­tio­nally sac­ri­fice be­liefs and prefer­ences to the cause of re­main­ing part of the group.

Most sig­nal­ling cases can be dealt with in the same way. So, though trib­al­ism is a ma­jor rea­son peo­ple can end up with con­tin­gent prefer­ences, it doesn’t in it­self pose prob­lems to the (p, R) model.

Per­sonal identity

The prob­lem of per­sonal iden­tity is a tricky one. I would like to re­main al­ive, happy, cu­ri­ous, hav­ing in­ter­est­ing ex­pe­rience, do­ing worth­while and varied ac­tivi­ties, etc...

Now, this is par­tially prefer­ences about fu­ture prefer­ences, but there’s the im­plicit iden­tity: I want this to hap­pen to me. Even when I’m be­ing al­tru­is­tic, I want these ex­pe­riences to hap­pen to some­one, not just to hap­pen in some ab­stract sense.

But the con­cept of per­sonal iden­tity is a com­pli­cated one, and it’s not clear if it can be col­lapsed eas­ily into the (p, R) for­mat.

“You’re not the boss of me!”

Fi­nally, even if per­sonal iden­tity is defined, it re­mains the case that peo­ple can judge differ­ent situ­a­tions de­pend­ing on how that situ­a­tion is achieved. Be­ing forced or ma­nipu­lated into a situ­a­tion will make them re­sent it much more than if they reach it through “nat­u­ral” means. Of course, what counts as ac­cept­able and un­ac­cept­able ma­nipu­la­tions change, is filled with bi­ases, in­con­sis­ten­cies, and in­cor­rect be­liefs (in my ex­pe­rience, far too many peo­ple think them­selves im­mune to ad­ver­tis­ing, for in­stance).

Car­ing about deriva­tives rather than positions

Peo­ple re­act strongly to situ­a­tions get­ting worse of bet­ter, not so much to the ab­solute qual­ity of the situ­a­tion.

Values that don’t make sense out of context

AIs would rad­i­cally re­shape the world and so­ciety. And yet hu­mans have deeply held val­ues that only make sense in nar­row con­texts—some­times, they already no longer make sense. For in­stance, in my opinion, of the five cat­e­gories in moral foun­da­tions the­ory, one no longer makes sense and three only make par­tial sense (and it seems to me that hav­ing these val­ues in a world where it’s liter­ally im­pos­si­ble to satisfy them, is part of the prob­lem peo­ple have with the mod­ern world):

  • Care: cher­ish­ing and pro­tect­ing oth­ers. This seems to me the strongest foun­da­tion; care re­mains well defined to­day, most es­pe­cially in the nega­tive “pro­tect peo­ple from harm” sense.

  • Pu­rity: ab­hor­rence for dis­gust­ing things, foods, ac­tions. This seems the weak­est foun­da­tion. Our an­ces­tral in­stincts of dis­gust for food and peo­ple are no longer cor­re­lated with ac­tual dan­ger, or with any­thing much. Dis­gust is the eas­iest value to ar­gue against, and the hard­est to defend, be­cause it pro­vokes such strong feel­ings but the bound­aries drawn around the ob­jects of dis­gust make no sense.

  • Fair­ness: ren­der­ing jus­tice ac­cord­ing to shared rules. Fair­ness and equal­ity make only par­tial sense in to­day’s world. It seems im­pos­si­ble to en­sure that ev­ery in­ter­ac­tion is fair, and that ev­ery­one gets their just desert (what­ever that means) or gets the same op­por­tu­ni­ties. But two sub­cat­e­gories do ex­ists: le­gal rights fair­ness/​equal­ity, and fi­nan­cial fair­ness/​equal­ity. Modern so­cieties achieve the first to some ex­tent, and make at­tempts at the sec­ond.

  • Author­ity: sub­mit­ting to tra­di­tion and le­gi­t­i­mate au­thor­ity. This also makes par­tial sense. Tra­di­tions is a poor guide in many situ­a­tions, and the source of au­thor­ity doesn’t sim­plify real prob­lems or guaran­tee solu­tions (which is the main rea­sons that dic­ta­tors are not gen­er­ally any bet­ter at solv­ing prob­lems). As with fair­ness, the sub­cat­e­gory of le­gal au­thor­ity is used ex­ten­sively in the world to­day.

  • Loy­alty: stand­ing with your group, fam­ily, na­tion. This value is weak, and may end up fur­ther weak­en­ing, down to the level of pu­rity. There are ba­si­cally too many pos­i­tive sum in­ter­ac­tions in to­day’s world. The benefits of trade and in­ter­act­ing with those out­side your in­group, are huge. Le­gally, most of loy­alty is ac­tu­ally for­bid­den—we don’t have laws en­courag­ing nepo­tism, rather the op­po­site.

This can be seen as a sub­set of the whole “un­der­defined hu­man val­ues”, but it could also be seen as an ar­gu­ment for pre­serv­ing or recre­at­ing cer­tain con­texts, in which these val­ues make sense.

A more com­plex for­mat needed

Th­ese are just some of the challenges to the (p, R) for­mat, and there are cer­tainly oth­ers. It’s not clear how much that for­mat needs to com­pli­cated in or­der to use­fully model all these ex­tra types of prefer­ences.

No comments.