Two More Decision Theory Problems for Humans

(This post has been sit­ting in my drafts folder for 6 years. Not sure why I didn’t make it pub­lic, but here it is now af­ter some edit­ing.)

There are two prob­lems closely re­lated to the On­tolog­i­cal Cri­sis in Hu­mans. I’ll call them the “Par­tial Utility Func­tion Prob­lem” and the “De­ci­sion The­ory Up­grade Prob­lem”.

Par­tial Utility Func­tion Problem

As I men­tioned in a pre­vi­ous post, the only ap­par­ent util­ity func­tion we have seems to be defined over an on­tol­ogy very differ­ent from the fun­da­men­tal on­tol­ogy of the uni­verse. But even on it’s na­tive do­main, the util­ity func­tion seems only par­tially defined. In other words, it will throw an er­ror (i.e., say “I don’t know”) on some pos­si­ble states of the heuris­ti­cal model. For ex­am­ple, this hap­pens for me when the num­ber of peo­ple gets suffi­ciently large, like 3^^^3 in Eliezer’s Tor­ture vs Dust Specks sce­nario. When we try to com­pute the ex­pected util­ity of some ac­tion, how should we deal with these “I don’t know” val­ues that come up?

(Note that I’m pre­sent­ing a sim­plified ver­sion of the real prob­lem we face, where in ad­di­tion to “I don’t know”, our util­ity func­tion could also re­turn es­sen­tially ran­dom ex­trap­o­lated val­ues out­side of the re­gion where it gives sen­si­ble out­puts.)

De­ci­sion The­ory Up­grade Problem

In the De­ci­sion The­ory Up­grade Prob­lem, an agent de­cides that their cur­rent de­ci­sion the­ory is in­ad­e­quate in some way, and needs to be up­graded. (Note that the On­tolog­i­cal Cri­sis could be con­sid­ered an in­stance of this more gen­eral prob­lem.) The ques­tion is whether and how to trans­fer their val­ues over to the new de­ci­sion the­ory.

For ex­am­ple a hu­man might be be run­ning a mix of sev­eral de­ci­sion the­o­ries: re­in­force­ment learn­ing, heuris­ti­cal model-based con­se­quen­tial­ism, iden­tity-based de­ci­sion mak­ing (where you adopt one or more so­cial roles, like “en­vi­ron­men­tal­ist” or “aca­demic” as part of your iden­tity and then make de­ci­sions based on pat­tern match­ing what that role would do in any given situ­a­tion), as well as vir­tual ethics and de­on­tol­ogy. If you are tempted to drop one or more of these in fa­vor of a more “ad­vanced” or “ra­tio­nal” de­ci­sion the­ory, such as UDT, you have to figure out how to trans­fer the val­ues em­bod­ied in the old de­ci­sion the­ory, which may not even be rep­re­sented as any kind of util­ity func­tion, over to the new.

Another in­stance of this prob­lem can be seen in some­one just want­ing to be a bit more con­se­quen­tial­ist. Maybe UDT is too strange and im­prac­ti­cal, but our na­tive model-based con­se­quen­tial­ism at least seems closer to be­ing ra­tio­nal than the other de­ci­sion pro­ce­dures we have. In this case we tend to as­sume that the con­se­quen­tial­ist mod­ule already has our real val­ues and we don’t need to “port” val­ues from the other de­ci­sion pro­ce­dures that we’re de­p­re­cat­ing. But I’m not en­tirely sure this is safe, since the step go­ing from (for ex­am­ple) iden­tity-based de­ci­sion mak­ing to heuris­ti­cal model-based con­se­quen­tial­ism doesn’t seem that differ­ent from the step be­tween heuris­ti­cal model-based con­se­quen­tial­ism and some­thing like UDT.