Decision Theory with F@#!ed-Up Reference Classes

Be­fore we can an­swer the ques­tion of what ought you do, we need to iden­tify ex­actly what agents are referred to by you. In some prob­lems, you refers to only a sin­gle, eas­ily iden­ti­fi­able agent with ac­tions pro­vid­ing de­ter­minis­tic re­sults, but in other prob­lems many agents will ex­pe­rience po­si­tions that are com­pletely in­dis­t­in­guish­able. Even then, we can nor­mally iden­tify a fixed set of agents who are pos­si­bly you and av­er­age over them. How­ever, there do ex­ist a set of prob­lems where this set of in­dis­t­in­guish­able agents de­pends on the de­ci­sion that you make, at which point it be­comes rather un­clear who ex­actly you are try­ing to op­ti­mise over. We will say that these prob­lems have De­ci­sion-In­con­sis­tent Refer­ence Classes.

While this may seem like merely a niche is­sue, given the but­terfly effect and a suffi­ciently long timeline with the pos­si­bil­ity of simu­la­tions, it is al­most guaran­teed that any de­ci­sion will change the refer­ence class. So un­der­stand­ing how to re­solve these is­sues is more im­por­tant than it might first ap­pear. More im­por­tantly, if I am cor­rect, Im­perfect Parfit’s Hitch­hiker doesn’t have a real an­swer and UDT would re­quire some rather sig­nifi­cant mod­ifi­ca­tions.

(This post is based upon the ma­te­rial in this com­ment, which I said that I was plan­ning on de­vel­op­ing into a full post. It con­tains some sub­stan­tial cor­rec­tions and ad­di­tions)


My ex­plo­ra­tion of this area is mostly mo­ti­vated by Im­perfect Parfit’s Hitch­hiker. Here we define this as Parfit’s Hitch­hiker with a driver that always de­tects when you are tel­ling the truth about pay­ing, but 1% of the time picks you up in­de­pen­dently of whether you are or aren’t be­ing truth­ful. We’ll also imag­ine that those agents who ar­rive in town dis­cover a week af­ter their de­ci­sion whether or not they were in the group who would have been picked up in­de­pen­dent of their de­ci­sion.

Solv­ing this prob­lem in­volves challenges that aren’t pre­sent in the ver­sion with perfect pre­dic­tors. After all, once we’ve defined a no­tion of coun­ter­fac­tu­als for perfect pre­dic­tors, (harder than it looks!), it’s clear that defect­ing against these pre­dic­tors is a los­ing strat­egy. There is no (in­side-view) down­side to com­mit­ting to tak­ing a sub-op­ti­mal ac­tion given an in­put that ought to be im­pos­si­ble. How­ever, as soon as the pre­dic­tors have even an ar­bi­trar­ily small amount of im­perfec­tion, choos­ing to pay ac­tu­ally means giv­ing up some­thing.

Given the nat­u­ral hu­man ten­dency to­wards fair­ness, it may be use­ful to re­call the True Pri­soner’s Dilemma—what if in­stead of res­cu­ing one per­son, the driver res­cued your en­tire fam­ily and in­stead of de­mand­ing $50 he de­manded that a ran­dom 50% of you be ex­e­cuted. In this new sce­nario, re­fus­ing to “pay” him for his ser­vices no longer seems quite so fair. And if you can get the bet­ter of him, why not do so? Or if this isn’t suffi­cient, we can imag­ine the driver declar­ing that it’s fair game to try fool­ing him into think­ing that you’ll pay.

Now that our goal is to beat the driver if that is at all pos­si­ble, we can see that this is prima fa­cie am­bigu­ous as it isn’t clear which agents we wish to op­ti­mise over. If we ul­ti­mately defect, then only 1% of agents ar­rive in town, but if we ul­ti­mately pay, then 100% of agents ar­rive in town. Should we op­ti­mise over the 1% or the 100%? Con­sider af­ter you’ve locked in your de­ci­sion, but be­fore it’s re­vealed whether you would have been picked up any­way (call this the im­me­di­ate af­ter­math). Strangely, in the im­me­di­ate af­ter­math you will re­flec­tively en­dorse what­ever de­ci­sion you made. An agent who de­cided to defect knows that they were in the 1% and so they would have always ended up in town; while those who de­cided to pay will as­sign only a 1% prob­a­bil­ity that they were go­ing to be picked up if they hadn’t paid. In the later case, the agent may later re­gret pay­ing if they dis­cover that they were in­deed in the 1%, but this will only be later, not in the im­me­di­ate af­ter­math.

Some Ex­am­ple problems

It’ll be eas­ier to at­tempt solv­ing this prob­lem if we gather some other prob­lems that mess with refer­ence classes. One such prob­lem is the Evil Ge­nie Puz­zle I defined in a pre­vi­ous post. This cre­ates the ex­act op­po­site prob­lem—you re­flec­tively re­gret whichever de­ci­sion you choose. If you choose the mil­lion dol­lars (I wrote perfect life in­stead in the post), you know in the im­me­di­ate af­ter­math that you are al­most cer­tainly a clone, so you should ex­pect to be tor­tured. How­ever, if you choose the rot­ten eggs, you know in the im­me­di­ate af­ter­math that you could have had a mil­lion dol­lars.

Since one po­ten­tial way of eval­u­at­ing situ­a­tions with De­ci­sion-In­con­sis­tent Refer­ence Classes is to sim­ply com­pare av­er­ages, we’ll also define the Dilu­tion Ge­nie Puz­zle. In this puz­zle, a ge­nie offers you $1,000,001 or $10. How­ever, if the ge­nie pre­dicts that you will choose the greater amount, it cre­ates 999,999 clones of you who will face what seems like an iden­ti­cal situ­a­tion, but they will ac­tu­ally each only re­ceive $1 when they in­evitably choose the same op­tion as you. This means that choos­ing $1,000,001 re­ally pro­vides an av­er­age of $2, so choos­ing $10 might ac­tu­ally be a bet­ter de­ci­sion, though if you do ac­tu­ally take it you could have won the mil­lion.

Pos­si­ble Ap­proaches:

Sup­pose an agent G faces a de­ci­sion D rep­re­sented by in­put I. The most ob­vi­ous ap­proaches for eval­u­at­ing these de­ci­sion are as fol­lows:

1) In­di­vi­d­ual Aver­ages: If X is an op­tion, calcu­late the ex­pected util­ity of X by av­er­ag­ing over all agents who ex­pe­rience in­put I if G chooses X. Choose the op­tion with the high­est ex­pected util­ity.

This ap­proach defects on Im­perfect Parfit’s Hitch­hiker, chooses the rot­ten eggs for Evil Ge­nie and chooses the $10 for Dilu­tion Ge­nie. Prob­lems like Perfect Parfit’s Hitch­hiker and Perfect Retro Black­mail are un­defined as the refer­ence class is empty. We can’t sub­sti­tute 0 av­er­age util­ity for an empty refer­ence class as in Perfect Parfit’s Hitch­hiker, this re­sults in us dy­ing in the desert. We also can’t strike out these op­tions and choose from those re­main­ing since in Retro Black­mail this will re­sult in us cross­ing out the op­tion to not pay. So a ma­jor flaw with this ap­proach is that it doesn’t han­dle prob­lems where one de­ci­sion in­val­i­dates the refer­ence class.

It is also some­what coun­ter­in­tu­itive that in­di­vi­d­u­als who count for one eval­u­at­ing one pos­si­ble op­tion may not count in an­other for the same de­ci­sion even if they still ex­ist.

2) Pair­wise Aver­ages: If X & Y are op­tions, com­pare these pair­wise by calcu­lat­ing the av­er­age util­ity over all agents who ex­pe­rience in­put I if G chooses ei­ther X or Y. Non-ex­is­tence is treated as a 0.

This ap­proach pays in Perfect or Im­perfect Parfit’s Hitch­hiker, chooses the rot­ten eggs for Evil Ge­nie, $1,000,001 for Dilu­tion Ge­nie and re­fuses to pay in Retro Black­mail.

Un­for­tu­nately, this doesn’t nec­es­sar­ily provide a con­sis­tent or­der­ing, as we’ll shortly see. The fol­low­ing di­a­gram rep­re­sents what I’ll call the Stair­case Pre­dic­tion Prob­lem be­cause of the shape of the un­der­lined en­tries:

#: 1 2 3 4 5 6 7
A: 0 1 0 0 0 0 0
B: 0 0 2 0 0 0 0
C: 0 0 0 0 1 0 0

There are 7 agents (num­bered 1-7) who are iden­ti­cal clones and three differ­ent pos­si­ble de­ci­sions (num­bered A-C). None of the clones know which one they are. A perfect pre­dic­tor pre­dicts which op­tion per­son 1 will pick if they are wo­ken up in town, since they are clones, they will also choose the same op­tion if they are wo­ken up in town.

The un­der­lined en­tries in­di­cate which peo­ple will be wo­ken up in town if it is pre­dicted that per­son 1 will make that op­tion and the non-un­der­lined en­tries in­di­cate who will be wo­ken up on a plain. For those who are in town, the num­bers in­di­cate how much util­ity each agent is re­warded with if they choose that op­tion. For those who aren’t in town, the agent is in­stead re­warded (or not) based on what they would coun­ter­fac­tu­ally do if they were in town.

Com­par­ing the lines pair-wise to see what de­ci­sion we should make in town, we find:

B beats A (2/​6 vs. 16)

C beats B (1/​6 vs. 06)

A beats C (1/​6 vs. 06)

Note that to be in­cluded in the av­er­age, a per­son only needs to be wo­ken in town in one of the two op­tions.

Since this pro­vides an in­con­sis­tent or­der­ing, this ap­proach must be flawed.

3) Over­all Aver­ages: If X is an op­tion, calcu­late the ex­pected util­ity of X by av­er­ag­ing over all agents for which there is at least one op­tion Y where they ex­pe­rience in­put I when G chooses Y. Non-ex­is­tence is treated as a 0.

This ap­proach is the same as 2) in many prob­lems: it pays in Perfect or Im­perfect Parfit’s Hitch­hiker, chooses the rot­ten eggs for Evil Ge­nie, $1,000,001 for Dilu­tion Ge­nie and re­fuses to pay in Retro Black­mail.

How­ever, we run into is­sues with ir­rele­vant con­sid­er­a­tions chang­ing our refer­ence classes. We will call this situ­a­tion the Catas­tro­phe But­ton Sce­nario.

#: 1 2 3
A: 3 3 0
B: 4 4 −1007
C: -∞ -∞ -∞

Again, un­der­lined rep­re­sents be­ing wo­ken up in town and non-un­der­lined rep­re­sents be­ing wo­ken up on the plain. As be­fore, peo­ple that are wo­ken up are based on the pre­dic­tion of Per­son 1′s de­ci­sion and agents that wake up in town don’t know who they are. C is the op­tion rep­re­sent­ing press­ing the Catas­tro­phe But­ton. No-one wants to press this but­ton as it leads to an uni­mag­in­ably bad out­come. Yet, us­ing over­all av­er­ages, the pres­ence of C makes us want to in­clude Per­son 3 in our calcu­la­tion of av­er­ages. Without per­son 3, A pro­vides an av­er­age util­ity of 3 and B of 4. How­ever, with per­son 3, A pro­vides an av­er­age of 2 and B an av­er­age of −333. So the pres­ence of the Catas­tro­phe But­ton re­verses the op­tion we choose de­spite it be­ing a but­ton that we will never press and hence clone 3 never be­ing wo­ken up in town. This seems ab­surd.

But I care about all of my clones

We ac­tu­ally don’t need full clones in or­der to cre­ate these re­sults. We can work with what I like to call semi-clones—agents that make ex­actly the same de­ci­sions in par­tic­u­lar situ­a­tions, but which have an in­cred­ibly differ­ent life story/​prefer­ences. For ex­am­ple, we could take an agent and change the coun­try it was brought up in, the flavours of ice-cream it likes and its gen­eral per­son­al­ity, whilst leav­ing its de­ci­sion the­ory com­po­nents ex­actly the same. Even if you nec­es­sar­ily care about your clones, there’s much less rea­son for a self­ish agent to care about its semi-clones. Or if that is in­suffi­cient, we can imag­ine that your semi-clones teamed up to mur­der your fam­ily. The only re­quire­ment for be­ing a semi-clone is that they come to the same de­ci­sion for a very re­stricted range of de­ci­sion the­ory prob­lems.

So if we make the in­di­vi­d­u­als all differ­ent semi-clones, but keep them from know­ing their iden­tity, they should only care about the agents that were wo­ken up in the town, as these are the only agents that are in­dis­t­in­guish­able.

What about UDT?

UDT only in­sists on a util­ity func­tion from the cross-product of ex­e­cu­tion his­to­ries of a set of pro­grams to the real num­bers and doesn’t define any­thing about how this func­tion ought to be­have. There is no re­stric­tion on whether it ought to be us­ing the Self-Indica­tive As­sump­tion or the Self-Sam­pling As­sump­tion for eval­u­at­ing ex­e­cu­tion his­to­ries with vary­ing amount of agents. There is no re­quire­ment to care about semi-clones or not care about them.

The only real re­quire­ment of the for­mal­ism is to calcu­late an av­er­age util­ity over all pos­si­ble ex­e­cu­tion his­to­ries weighted by the prob­a­bil­ity of them oc­cur­ring. So, for ex­am­ple, in Im­perfect Parfit’s Hitch­hiker with a sin­gle agent, we can’t just calcu­late an av­er­age for the cases where you hap­pen to be in town, but we need to as­sign a util­ity for when you are left in the desert. But if we ran the prob­lem with 100 hitch­hik­ers, one of whom would always be picked up in­de­pen­dently of their de­ci­sion, we could define a util­ity func­tion that only took into ac­count those who ac­tu­ally ar­rived in town. But this util­ity func­tion isn’t just used to calcu­late the de­ci­sion for one in­put, but an in­put-out­put map for all pos­si­ble in­puts and out­puts. It seem lu­dicrous that de­ci­sions only rele­vant to the desert should be calcu­lated just for those who end up in town.

Where does this leave us? UDT could tech­ni­cally rep­re­sent Pro­posal 1, but in ad­di­tion to the is­sue with empty refer­ence classes, but seems to be an abuse of the for­mal­i­sa­tion. Pro­posal 2 is in­co­her­ent. Pro­posal 3 is very nat­u­ral for UDT, but leads to ir­rele­vant con­sid­er­a­tions af­fect­ing our de­ci­sion.

So UDT doesn’t seem to tell us much about what we ought to do, nor provide a solu­tion and even if it did, the spe­cific ap­proach would need to be jus­tified rather than merely as­sumed.

What if we re­jected per­sonal iden­tity?

If we ar­gued that you shouldn’t care any more about what are tra­di­tion­ally seen as your fu­ture ob­server mo­ments than any­one else’s, none of the sce­nar­ios dis­cussed above would pose an is­sue. You would sim­ply care about the av­er­age or to­tal util­ity of all fu­ture per­son mo­ments in­de­pen­dent of whose they might ap­pear to be. Of course, this would be a rad­i­cal shift for most peo­ple.

What if we said that there was no best de­ci­sion?

All of the above the­o­ries choose the rot­ten eggs in the Evil Ge­nie prob­lem, but none of them seem to give an ad­e­quate an­swer to the com­plaint that the de­ci­sion isn’t re­flec­tively con­sis­tent. So it seems like a rea­son­able pro­posal to sug­gest that the no­tion of a “best de­ci­sion” de­pends on there be­ing a fixed refer­ence class. This would mean that there would be no real an­swer to Im­perfect Parfit’s Hitch­hiker. It would also re­quire there to be sig­nifi­cant mod­ifi­ca­tions to UDT, but this is cur­rently the di­rec­tion that I’m lean­ing.

No nominations.
No reviews.