“Solving” selfishness for UDT

With many thanks to Bel­uga and lack­ofcheese.

When try­ing to de­cide be­tween SIA and SSA, two an­thropic prob­a­bil­ity the­o­ries, I con­cluded that the ques­tion of an­thropic prob­a­bil­ity is badly posed and that it de­pends en­tirely on the val­ues of the agents. When de­bat­ing the is­sue of per­sonal iden­tity, I con­cluded that the ques­tion of per­sonal iden­tity is badly posed and de­pends en­tirely on the val­ues of the agents. When the is­sue of self­ish­ness in UDT came up re­cently, I con­cluded that the ques­tion of self­ish­ness is...

But let’s not get ahead of our­selves.

A self­ish scenario

Us­ing An­thropic De­ci­sion The­ory, I demon­strated that self­ish agents us­ing UDT should rea­son in the same way that av­er­age util­i­tar­i­ans did—es­sen­tially be­hav­ing ‘as if’ SSA were true and go­ing for even odds of heads and tail (“halfer”) in the Sleep­ing Beauty prob­lem.

Then Bel­uga posted an ar­gu­ment in­volv­ing gnomes, that seemed to show that self­ish UDT agents should rea­son as to­tal util­i­tar­i­ans did—es­sen­tially be­hav­ing ‘as if’ SIA were true and go­ing for 2:1 odds of heads and tail (“thirder”) in the Sleep­ing Beauty prob­lem. After a bit of back and forth, lack­ofcheese then re­fined the ar­gu­ment. I no­ticed the re­fined ar­gu­ment was solid, and in­ci­den­tally made the gnomes un­nec­es­sary.

How does the ar­gu­ment go? Briefly, a coin is flipped and an in­cu­ba­tor ma­chine cre­ates ei­ther one per­son (on heads) or two peo­ple (on tails), each in sep­a­rate rooms.

Without know­ing what the coin flip was or how many peo­ple there were in the uni­verse, ev­ery new per­son is pre­sented with a coupon that pays £1 if the coin came out tails. The ques­tion is—as­sum­ing util­ity is lin­ear in money—what amount £x should the cre­ated per­son(s) pay for this coupon?

The ar­gu­ment from Bel­uga/​lack­ofcheese can be phrased like this. Let’s name the peo­ple in the tails world, call­ing them Jack and Roger (yes, they like dress­ing like princesses—what of it?). Each of them rea­sons some­thing like this:

“There are four pos­si­ble wor­lds here. In the tails world, I, Jack/​Roger, could ex­ist in Room 1 or in Room 2. And in the heads world, it could be ei­ther me ex­ist­ing in Room 1, or the other per­son ex­ist­ing in Room 1 (in which case I don’t ex­ist). I’m com­pletely in­differ­ent to what hap­pens in wor­lds where I don’t ex­ist (sue me, I’m self­ish). So if I buy the coupon for £x, I ex­pect to make util­ity: 0.25(0) + 0.25(-x) + 0.5(1-x)=0.5-0.75x. There­fore I will buy the coupon for x<£2/​3.”

    That seems a rather solid ar­gu­ment (at least, if you al­low coun­ter­fac­tu­als into wor­lds where you don’t ex­ist, which you prob­a­bly should). So it seems I was wrong and that self­ish agents will in­deed go for the SIA-like “thirder” po­si­tion.

    Not so fast...

    Another self­ish scenario

    The above ar­gu­ment re­minded me of one I made a long time ago, when I “proved” that SIA was true. I sub­se­quently dis­carded that ar­gu­ment, af­ter look­ing more care­fully into the mo­ti­va­tions of the agents. So let’s do that now.

    Above, I was us­ing a sub­tle in­tu­ition pump by us­ing the sep­a­rate names Jack and Roger. That gave con­no­ta­tions of “I, Jack, don’t care about wor­lds in which I, Jack, don’t ex­ist...” But in the origi­nal for­mu­la­tion of the Sleep­ing Beauty/​in­cu­ba­tor prob­lem, the agents were strictly iden­ti­cal! There is no Jack ver­sus Roger is­sues—at most, these are la­bels, like 1 and 2.

    It there­fore seems pos­si­ble that the self­ish agent could rea­son:

    “There are three pos­si­ble wor­lds here. In the tails world, I ei­ther ex­ist in Room 1 or Room 2. And in the heads world, ei­ther I ex­ist in Room 1, or an iden­ti­cal copy of me ex­ists in Room 1, and is the only copy of me in that world. I fail to see any ac­tual differ­ence be­tween those two sce­nar­ios. So if I buy the coupon for £x, I ex­pect to make util­ity: 0.5(-x) + 0.5(1-x)=0.5-x. There­fore I will buy the coupon for x<£1/​2.”

    The self­ish agent seems on rather solid ground here in their heads world rea­son­ing. After all, would we treat some­one else differ­ently if we were told “That’s not ac­tu­ally your friend; in­stead it’s a perfect copy of your friend, while the origi­nal never ex­isted”?

    No­tice that even if we do al­low for the Jack/​Roger dis­tinc­tion, it seems rea­son­able for the agent to say “If I don’t ex­ist, I value the per­son that most closely re­sem­bles me.” After all, we all change from mo­ment to mo­ment, and we value our fu­ture selves. This idea is akin to Noz­ick’s “clos­est con­tinuer” con­cept.

    Each self­ish per­son is self­ish in their own unique way

    So what is re­ally go­ing on here? Let’s call the first self­ish agent a thirder-self­ish agent, and the sec­ond a halfer-self­ish agent. Note that both types of agents have perfectly con­sis­tent util­ity func­tions defined in all pos­si­ble ac­tual and coun­ter­fac­tual uni­verses (af­ter giv­ing the thirder-self­ish agent some ar­bi­trary con­stant C, which we may as well set to zero, in wor­lds where they don’t ex­ist). Com­pare the two ver­sions of Jack’s util­ity:

    “Jack in Room 1“
    ”Roger in Room 1”
    Heads: buy coupon
    -x/​-x 0/​-x
    Heads: re­ject coupon
    00 00
    Tails: buy coupon
    1-x/​1-x 1-x/​1-x
    Tails: re­ject coupon
    00 00

    The util­ities are given as thirder-self­ish util­ity/​halfer-self­ish util­ity. The situ­a­tion where there is a di­ver­gence is in­di­cated in bold—that one differ­ence is key to their differ­ent de­ci­sions.

    At this point, peo­ple could be tempted to ar­gue as to which type of agent is gen­uinely the self­ish agent… But I can fi­nally say:

    • The ques­tion of self­ish­ness is badly posed and de­pends en­tirely on the val­ues of the agents.

    What do I mean by that? Well, here is a self­ish util­ity func­tion: “I ex­pect all fu­ture copies of Stu­art Arm­strong to form a sin­gle con­tin­u­ous line through time, chang­ing only slowly, and I value the hap­piness (or prefer­ence satis­fac­tion) of all these fu­ture copies. I don’t value fu­ture copies of other peo­ple.”

    That seems pretty stan­dard self­ish­ness. But this is not a util­ity func­tion; it’s a par­tial de­scrip­tion of a class of util­ity func­tions, defined only in one set of uni­verses (the set where there’s a sin­gle fu­ture timeline for me, with­out any “weird” copy­ing go­ing on). Both the thirder-self­ish util­ity func­tion and the halfer-self­ish one agree in such sin­gle timeline uni­verses. They are there­fore both ex­ten­sions of the same par­tial self­ish util­ity to more gen­eral situ­a­tions.

    Ar­gu­ing which is “cor­rect” is pointless. Both will pos­sess all the fea­tures of self­ish­ness we’ve used in ev­ery­day sce­nar­ios to define the term. We’ve en­larged the do­main of pos­si­ble sce­nar­ios be­yond the usual set, so our con­cepts, forged in the usual set, can ex­tend in mul­ti­ple ways.

    You could see the halfer-self­ish val­ues as a ver­sion of the “Psy­cholog­i­cal Ap­proach” to per­sonal iden­tity: it val­ues the util­ity of the be­ing clos­est to it­self in any world. A halfer-self­ish agent would cheer­fully step into a tele­porter where they are scanned, copied onto a dis­tant lo­ca­tion, then the origi­nal is de­stroyed. The thirder-self­ish agent might not. Be­cause the thirder-self­ish agent is ac­tu­ally un­der­speci­fied: the most ex­treme ver­sion would be one that does not value any fu­ture copies of them­selves. They would in­deed “jump off a cliff know­ing smugly that a differ­ent per­son would ex­pe­rience the con­se­quence of hit­ting the ground.” Most ver­sions of the thirder-self­ish agent that peo­ple have in mind are less ex­treme than that, but defin­ing (ei­ther) agent re­quires quite a bit of work, not sim­ply a sin­gle word: “self­ish”.

    So it’s no won­der that UDT has difficulty with self­ish agents: the con­cept is not well defined. Selfish agent is like “feather­less biped”—a par­tial defi­ni­tion that pur­ports to be the whole of the truth.

    Per­sonal iden­tity and values

    Differ­ent view of per­sonal iden­tity can be seen as iso­mor­phic with a par­tic­u­lar self­ish util­ity func­tion. The iso­mor­phism is sim­ply done by car­ing about the util­ity of an­other agent if and only if they share the same per­sonal iden­tity.

    For in­stance, the psy­cholog­i­cal ap­proach to per­sonal iden­tity posits that “You are that fu­ture be­ing that in some sense in­her­its its men­tal fea­tures—be­liefs, mem­o­ries, prefer­ences, the ca­pac­ity for ra­tio­nal thought, that sort of thing—from you; and you are that past be­ing whose men­tal fea­tures you have in­her­ited in this way.” Thus a psy­cholog­i­cal self­ish util­ity func­tion would value the prefer­ences of a be­ing that was con­nected to the agent in this way.

    The so­matic ap­proach posits that “our iden­tity through time con­sists in some brute phys­i­cal re­la­tion. You are that past or fu­ture be­ing that has your body, or that is the same biolog­i­cal or­ganism as you are, or the like.” Again, this can be used to code up a util­ity func­tion.

    Those two ap­proaches (psy­cholog­i­cal and so­matic) are ac­tu­ally broad cat­e­gories of ap­proaches, all of which would have a slightly differ­ent “self­ish” util­ity func­tion. The non-branch­ing view, for in­stance, posits that if there is only one fu­ture copy of you, that is you, but if there are two, there is no you (you’re effec­tively dead if you du­pli­cate). This seems mildly ridicu­lous, but it still ex­presses very clear prefer­ences over pos­si­ble wor­lds that can be cap­tured in a util­ity func­tion.

    Some var­i­ants al­low for par­tial per­sonal iden­tity. For in­stance, dis­count­ing could be rep­re­sented by a util­ity func­tion that puts less weight on copies more dis­tant in the fu­ture. If you al­low “al­most iden­ti­cal copies”, then these could be rep­re­sented by a util­ity func­tion that gives par­tial credit for similar­ity along some scale (this would tend to give a de­ci­sion some­where in be­tween the thirder and halfer situ­a­tions pre­sented above).

    Many of the “para­doxes of iden­tity” dis­solve en­tirely when one uses val­ues in­stead of iden­tity. Con­sider the in­tran­si­tivity prob­lem for some ver­sions of psy­cholog­i­cal iden­tity:

    First, sup­pose a young stu­dent is fined for over­due library books. Later, as a mid­dle-aged lawyer, she re­mem­bers pay­ing the fine. Later still, in her dotage, she re­mem­bers her law ca­reer, but has en­tirely for­got­ten not only pay­ing the fine but ev­ery­thing else she did in her youth. [...] the young stu­dent is the mid­dle-aged lawyer, the lawyer is the old woman, but the old woman is not the young stu­dent.

    In terms of val­ues, this prob­lem is non-ex­is­tent: the young stu­dent val­ues her­self, the lawyer and the old woman (as does the lawyer) but the old woman only val­ues her­self and the lawyer. That value sys­tem is in­el­e­gant, per­haps, but it’s not ridicu­lous (and “valu­ing past copies” might be de­ci­sion-rele­vant in cer­tain coun­ter­fac­tual situ­a­tions).

    Similarly, con­sider the ques­tion as to whether it is right to pun­ish some­one for the law-break­ing of a past copy of them­selves. Are they the same per­son? What if, due to an ac­ci­dent or high tech­nol­ogy, the pre­sent copy has no mem­ory of law-break­ing or of be­ing the past per­son? Us­ing iden­tity gets this hope­lessly mud­dled, but from a con­se­quen­tial­ist de­ter­rence per­spec­tive, the an­swer is sim­ple. The past copy pre­sum­ably val­ued their fu­ture copy stay­ing out of jail. There­fore, from the de­ter­rence per­spec­tive, we should pun­ish the cur­rent copy to de­ter such ac­tions. In courts to­day, we might al­low am­ne­sia to be a valid ex­cuse, sim­ply be­cause am­ne­sia is so hard and dan­ger­ous to pro­duce de­liber­ately. But this may change in the fu­ture: if it be­comes easy to rewire your own mem­ory, then de­ter­rent pun­ish­ment will need to move be­yond the clas­si­cal no­tions of iden­tity and pun­ish peo­ple we would cur­rently con­sider blame­less.

    Evolu­tion and identity

    Why are we con­vinced that there is such a thing as self­ish­ness and per­sonal iden­tity? Well, let us note that it is in the in­ter­est of evolu­tion that we be­lieve in it. The “in­ter­ests” of the genes are to be passed on, and so they benefit if the car­rier of the gene in the pre­sent val­ues the sur­vival of the (same) car­rier of the gene in the fu­ture. The gene does not “want” the car­rier to jump off a cliff, be­cause what­ever the is­sues of per­sonal iden­tity, it’ll be the same gene in the body that gets squashed at the end. Similarly, fu­ture copies of your­self are the copies that you have the most con­trol over, through your cur­rent ac­tions. So genes have ex­cep­tion­ally strong in­ter­ests in mak­ing you value “your” fu­ture copies. Even your twin is not as valuable as you: ge­net­i­cally your’re equiv­a­lent, but your cur­rent de­ci­sions have less im­pact over them than over your fu­ture self. Thus is self­ish­ness cre­ated.

    It seems that evolu­tion has re­sulted in hu­man copies with phys­i­cal con­ti­nu­ity, in­fluence over (fu­ture) and mem­o­ries of (past), and in very strong cross-time car­ing be­tween copies. Th­ese are unique to a sin­gle time line of copies, so no won­der peo­ple have seen them as “defin­ing” per­sonal iden­tity. And “The per­son to­mor­row is me” is prob­a­bly more com­pact than say­ing that you care about the per­son to­mor­row, and list­ing the fea­tures con­nect­ing you. In the fu­ture, the first two com­po­nents may be­come malle­able, leav­ing only car­ing (a value) as the rem­nants of per­sonal iden­tity.

    This idea al­lows us to do some­thing we gen­er­ally can’t, and di­rectly com­pare the “qual­ity” of value sys­tems—at least from the evolu­tion­ary point of view, ac­cord­ing to the value sys­tem’s own crite­ria.

    Here is an ex­am­ple of an in­fe­rior self­ish de­ci­sion the­ory: agents us­ing CDT, and valu­ing all fu­ture ver­sions of them­selves, but not any other copies. Why is this in­fe­rior? Be­cause if the agent is du­pli­cated, they want those du­pli­cates to co­op­er­ate and value each other equally, be­cause that gives the cur­rent agent the best pos­si­ble ex­pected util­ity. But if each copy has the same util­ity as the agent started with, then CDT guaran­tees ri­valry, prob­a­bly to the detri­ment of ev­ery agent. In effect, the agent wants its fu­ture self to have differ­ent self­ish/​in­dex­i­cal val­ues from the ones it has, in or­der to pre­serve the same over­all val­ues.

    This prob­lem can be avoided by us­ing UDT, CDT with pre­com­mit­ments, or a self­ish util­ity func­tion that val­ues all copies equally. Those three are more “evolu­tion­ar­ily sta­ble”. So is, for in­stance, a self­ish util­ity func­tion with an ex­po­nen­tial dis­count rate—but not one with any other dis­count rate. This is an in­ter­est­ing fea­ture of this ap­proach: the set of evolu­tion­ary sta­ble self­ish de­ci­sion the­o­ries is smaller than the set of self­ish de­ci­sion the­o­ries. Thus there are many cir­cum­stances where differ­ent self­ish util­ities will give the same de­ci­sions un­der the same de­ci­sion the­ory, or where the differ­ent de­ci­sion the­o­ries/​util­ities will self-mod­ify to make iden­ti­cal de­ci­sions.

    One would like to make an ar­gu­ment about Rawlsian veils of ig­no­rance and UDT-like ini­tial pre-com­mit­ments lead­ing to gen­eral al­tru­ism or some­thing… But that’s an­other ar­gu­ment, for an­other time. Note that this kind of ar­gu­ment can­not be used against the most ridicu­lous self­ish util­ity func­tion of all: “me at ev­ery mo­ment is a differ­ent per­son I don’t value at all”. Some­one with that util­ity func­tion will quickly die, but, ac­cord­ing to its own util­ity, it doesn’t see this as a prob­lem.

    To my mind, the in­ter­est­ing thing here is that while there are many “non-in­dex­i­cal” util­ity func­tions that are sta­ble un­der self-mod­ifi­ca­tion, this is not the case for most self­ish and in­dex­i­cal ones.