Treating anthropic selfish preferences as an extension of TDT


When prefer­ences are self­less, an­thropic prob­lems are eas­ily solved by a change of per­spec­tive. For ex­am­ple, if we do a Sleep­ing Beauty ex­per­i­ment for char­ity, all Sleep­ing Beauty has to do is fol­low the strat­egy that, from the char­ity’s per­spec­tive, gets them the most money. This turns out to be an easy prob­lem to solve, be­cause the an­swer doesn’t de­pend on Sleep­ing Beauty’s sub­jec­tive per­cep­tion.

But self­ish prefer­ences—like be­ing at a com­fortable tem­per­a­ture, eat­ing a candy bar, or go­ing sky­div­ing—are trick­ier, be­cause they do rely on the agent’s sub­jec­tive ex­pe­rience. This trick­i­ness re­ally shines through when there are ac­tions that can change the num­ber of copies. For re­cent posts about these sorts of situ­a­tions, see Pal­las’ sim game and Jan_Ryzmkowski’s trop­i­cal par­adise. I’m go­ing to pro­pose a model that makes an­swer­ing these sorts of ques­tions al­most as easy as play­ing for char­ity.

To quote Jan’s prob­lem:

It’s a cold cold win­ter. Ra­di­a­tors are hardly work­ing, but it’s not why you’re sit­ting so anx­iously in your chair. The real rea­son is that to­mor­row is your as­signed up­load, and you just can’t wait to leave your cor­po­ral­ity be­hind. “Oh, I’m so sick of hav­ing a body, es­pe­cially now. I’m freez­ing!” you think to your­self, “I wish I were already up­loaded and could just pop my­self off to a trop­i­cal is­land.”

And now it strikes you. It’s a weird solu­tion, but it feels so ap­peal­ing. You make a solemn oath (you’d say one in mil­lion chance you’d break it), that soon af­ter up­load you will simu­late this ex­act scene a thou­sand times si­mul­ta­neously and when the clock strikes 11 AM, you’re gonna be trans­posed to a Hawaiian beach, with a fancy drink in your hand.

It’s 10:59 on the clock. What’s the prob­a­bil­ity that you’d be in a trop­i­cal par­adise in one minute?

So ques­tion one is the prob­a­bil­ity ques­tion: what’s your prob­a­bil­ity that you go to the trop­i­cal par­adise? And ques­tion two is the de­ci­sion prob­lem: is this ac­tu­ally a good idea?

The prob­a­bil­ity ques­tion is straight­for­ward, and is in­deed about a 1000/​1001 chance of trop­i­cal par­adise. If this does not make sense, feel free to ask about it, or go check out these two ram­bling com­ple­men­tary posts: Deriv­ing prob­a­bil­ities from causal di­a­grams, More mar­bles and Sleep­ing Beauty.

One might then make an ar­gu­ment about the de­ci­sion ques­tion that goes like this: Be­fore I swore this oath, my prob­a­bil­ity of go­ing to a trop­i­cal is­land was very low. After, it was very high. Since I re­ally like trop­i­cal is­lands, this is a great idea. In a nut­shell, I have in­creased my ex­pected util­ity by mak­ing this oath.

The coun­ter­ar­gu­ment is also sim­ple, though: Mak­ing copies of my­self has no causal effect on me. Swear­ing this oath does not move my body to a trop­i­cal par­adise. What re­ally hap­pens is that I just sit there in the cold just the same, but then later I make some simu­la­tions where I lie to my­self. This is not a higher-util­ity uni­verse than the one where I don’t swear the oath.

Hope­fully you can see how this is con­fus­ing.


So, my pro­posal, in short form: You are a per­son. I mean this not in the ab­stract, non-causal, sense, where if I make a copy of you and then shoot you, “you live on.” I mean that the iso­lated causal agent read­ing this is a per­son ca­pa­ble of self­ish de­sires, where if you are one of two copies and I give the other copy a candy bar, your self­ish de­sires for eat­ing candy are un­fulfilled1. Choose as if you were con­trol­ling the out­put of your de­ci­sion al­gorithm, so that you max­i­mize your ex­pected util­ity, in­clud­ing self­ish de­sires (if you have them), con­di­tioned on the fact that you ex­ist (I’ll come back to what this last bit means in part III).

This is at its heart port­ing TDT to an­thropic prob­lems. When there is a de­ci­sion your origi­nal body can make that cre­ates a bunch of copies, and the copies are also faced with this de­ci­sion, your de­ci­sion lets you con­trol whether you are the origi­nal or a copy. If you don’t want to be a copy, as in Pal­las’ sim game, you have to take that into ac­count. If you do want to be a copy, you take that into ac­count too.

This leads to bit­ing the bul­let in Jan’s trop­i­cal par­adise. It is ac­tu­ally a good idea to take an ac­tion that, if you’re the origi­nal body, cre­ates a bunch of high-self­ish-ex­pected-util­ity copies that also un­dergo the de­ci­sion you’re mak­ing right now, be­cause this de­ci­sion con­trols whether you’re one of those copies.

There is an im­por­tant caveat: this only holds if you truly would like to be one of those copies. To re­peat the short form, this de­ci­sion al­gorithm as­sumes that you are a per­son try­ing to in­crease their own ex­pected util­ity. Th­ese copies can’t just be made and dis­posed of to ma­nipu­late your sub­jec­tive ex­pe­rience—some­thing which is pos­si­ble, but is a bad idea. The copies have to be peo­ple who you would ac­tu­ally like to be, who go on to live long, fulfilling lives. This is not about gam­ing the sys­tem. It’s just an ex­ten­sion of TDT to an­thropic situ­a­tions.

In­ter­est­ingly, this usu­ally gives the same re­sults as play­ing for char­ity. Thus there’s a sort of lo­cal­ity of money, where you make similar trade­offs be­tween self­ish spend­ing and char­i­ta­ble spend­ing no mat­ter how many copies of you there are.

To de­liber­ately con­struct an in­tu­itive case, imag­ine that you are already up­loaded, and you’re led into a room (a simu­lated room, of course) where Omega is wait­ing for you. Omega says hello, and asks you whether you think you’re the origi­nal or a copy. “Huh? Have I been copied?” you say. Ex­cel­lent, Omega says. It then pre­sents two boxes to you, box A and box B. Box A always has some okay candy in it for you to eat (eat­ing candy is my go-to self­ish re­ward). If you pick box B and are the origi­nal, it is empty, but you will be copied a mil­lion times from a snap­shot when you en­tered the room, and offered the same choice—and if you are a copy, box B con­tains very deli­cious candy to eat (and then the copies go on to live similar lives to the origi­nal). Again there’s the odd prop­erty that the out­put of your de­ci­sion al­gorithm con­trols whether you are likely a copy or not. If you would pre­fer to be a copy, then you should pick box B.

There’s a pre­com­mit­ment prob­lem here. Sup­pose I value my fu­ture selves by a sum of their util­ities (given some zero point). Then even if be­ing a copy was not so great (like in Pal­las’ sim game), I’d pre­com­mit to mak­ing as many copies as pos­si­ble. But once the game starts, by my defi­ni­tion of self­ish prefer­ences I don’t care much about whether the other copies get a self­ish re­ward, and so I might try to fight that pre­com­mit­ment to raise my ex­pected util­ity.

In fact, these pre­com­mit­ment prob­lems crop up when­ever I calcu­late ex­pected value in any other way than by av­er­ag­ing util­ity among fu­ture copies. This is a state­ment about a small piece of pop­u­la­tion ethics, and as such, should be highly sus­pect—the fact that my preferred model of self­ish prefer­ences says any­thing about even this small sub­set of pop­u­la­tion ethics makes me sig­nifi­cantly less con­fi­dent that I’m right. Even though the thing it’s say­ing seems sen­si­ble.

Foot­note 1: The reader who has been fol­low­ing my posts may note how this iden­ti­fi­ca­tion of who has the prefer­ences via causal­ity makes self­ish prefer­ences well-defined no mat­ter how many times I define the pat­tern “I” to map to my brain, which is good be­cause it makes the pro­cess well-defined, but also some­what difficult be­cause it elimi­nates the last de­pen­dence on a lower level where we can think of an­thropic prob­a­bil­ities as de­ter­mined a pri­ori, rather than de­pend­ing on a defi­ni­tion of self grounded in de­ci­sion-mak­ing as well as ex­pe­rienc­ing. On the other hand, with that level con­flict gone, maybe there’s noth­ing stop­ping us from think­ing of an­thropic prob­a­bil­ities on this more con­tin­gent level as “ob­vi­ous” or “a pri­ori.”


It’s worth bring­ing up Eliezer’s an­thropic trilemma (fur­ther thought by Katja Grace here). The idea is to sub­jec­tively ex­pe­rience win­ning the lot­tery by en­ter­ing a lot­tery and then repli­cat­ing your­self a trillion times, wake up to have the ex­pe­rience, and then merge back to­gether. Thus, the ar­gu­ment goes, as long as prob­a­bil­ity flows along causal chan­nels, by wak­ing up a trillion times I have cap­tured the sub­jec­tive ex­pe­rience, and will go on to sub­jec­tively ex­pe­rience win­ning the lot­tery.

Again we can ask the two ques­tions: What are the prob­a­bil­ities? And is this ac­tu­ally a good idea?

This is the part where I come back to ex­plain that ear­lier ter­minol­ogy—why is it im­por­tant that I speci­fied that you con­di­tion on your own ex­is­tence? When you con­di­tion on the fact that you ex­ist, you get an an­thropic prob­a­bil­ity. In the story about Omega I told above, your prob­a­bil­ity that you’re the origi­nal be­fore you en­ter the room is 1. But af­ter you en­ter the room, if your de­ci­sion al­gorithm chooses box B, your prob­a­bil­ity that you’re the origi­nal should go down to one in a mil­lion. This up­date is pos­si­ble be­cause you’re up­dat­ing on new in­for­ma­tion about where you are in the game—you’re con­di­tion­ing on your own ex­is­tence.

Note that I did not just say “use an­thropic prob­a­bil­ities.” When calcu­lat­ing ex­pected util­ity, you con­di­tion on your own ex­is­tence, but you most cer­tainly do not con­di­tion on fu­ture selves’ ex­is­tence. After all, you might get hit by a me­teor and die, so you don’t ac­tu­ally know that you’ll be around to­mor­row, and you shouldn’t con­di­tion on things you don’t know. Thus the player at rus­sian roulette who says “It’s okay, I’ll sub­jec­tively ex­pe­rience win­ning!” is mak­ing a de­ci­sion by con­di­tion­ing on in­for­ma­tion they do not have.

Katja Grace talks about two prin­ci­ples act­ing in the An­thropic Trilemma: Fol­low The Crowd, which sends your sub­jec­tive ex­pe­rience into the branch with more peo­ple, and Blatantly Ob­vi­ous Prin­ci­ple, which says that your sub­jec­tive ex­pe­rience should fol­low causal paths. Katja points out that they do not just cause prob­lems when merg­ing, they also con­flict when split­ting—so Eliezer is be­ing se­lec­tive in ap­ply­ing these prin­ci­ples, and there’s a deeper prob­lem here. If you re­call me men­tion­ing my two-fluid model of an­throp­ics, I par­tially re­solved this by track­ing two mea­sures, one that obeyed FTC (sub­jec­tive prob­a­bil­ity), and one that obeyed BOP (magic re­al­ity fluid).

But the model I’m pre­sent­ing here dis­solves those fluids (or would it be ‘dilutes’?) - the thing that fol­lows the crowd is who you think you are, and the blatantly ob­vi­ous thing is your ex­pec­ta­tion for the fu­ture. There’s no sub­jec­tive ex­pe­rience fluid that it’s pos­si­ble to push around with­out chang­ing the phys­i­cal state of the uni­verse. There’s just peo­ple.

To give the prob­a­bil­ities in the An­thropic Trilemma, it is im­por­tant to track what in­for­ma­tion you’re con­di­tion­ing on. If I con­di­tion on my ex­is­tence just af­ter I buy my ticket, my prob­a­bil­ity that I picked the win­ning num­bers is small, no mat­ter what an­thropic hijinks might hap­pen if I win, I still ex­pect to see those hijinks hap­pen with low prob­a­bil­ity2. If I con­di­tion on the fact that I wake up af­ter pos­si­bly be­ing copied, my prob­a­bil­ity that I picked the win­ning num­bers is large, as is my prob­a­bil­ity that I will have picked the win­ning num­bers in the fu­ture, even if I get copied or merged or what have you. Then I learn the re­sult, and no longer have a sin­gle state of in­for­ma­tion which would give me a prob­a­bil­ity dis­tri­bu­tion. Com­pare this to the sec­ond horn of the trilemma; it’s easy to get mixed up when giv­ing prob­a­bil­ities if there’s more than one set of prob­a­bil­ities to give.

Okay, so that’s the prob­a­bil­ities—but is this ac­tu­ally a good idea? Sup­pose I’m just in it for the money. So I’m stand­ing there con­sid­er­ing whether to buy a ticket, and I con­di­tion on my own ex­is­tence, and the chances of win­ning still look small, and so I don’t buy the ticket. That’s it. This is es­pe­cially clear if I donate my win­nings to char­ity—the only win­ning move is not to play the lot­tery.

Sup­pose then in­stead that I have a self­ish de­sire to ex­pe­rience win­ning the lot­tery, in­de­pen­dent of the money—does copy­ing my­self if I win help fulfill this de­sire? Or to put this an­other way, in calcu­lat­ing ex­pected util­ity we weight the self­ish util­ity of the many win­ning copies less be­cause win­ning is un­likely, but do we weight it more be­cause there are more of them?

This ques­tion is re­solved by (pos­si­ble warn­ing sign) the al­most-pop­u­la­tion-ethics re­sult above, which says that as an at­trac­tor of self-mod­ifi­ca­tion we should av­er­age copies’ util­ities rather than sum­ming them, and so copy­ing does not in­crease ex­pected util­ity. Again, I find this in­com­pletely con­vinc­ing, but it does seem to be the ex­ten­sion of TDT here. So this pro­ce­dure does not bite the bul­let in the an­thropic trilemma. But re­mem­ber the be­hav­ior in Jan’s trop­i­cal par­adise game? It is in fact pos­si­ble to de­sign a pro­ce­dure that lets you satisfy your de­sire to win the lot­tery—just have the copies cre­ated when you win start from a snap­shot of your­self be­fore you bought the lot­tery ticket.

This is a weird bul­let to bite. It’s like, how come it’s a good idea to cre­ate copies that go through the de­ci­sion to cre­ate copies, but only a neu­tral idea to cre­ate copies that don’t? After all, win­ning and then cre­at­ing simu­la­tions has the same low chance no mat­ter what. The differ­ence is en­tirely an­thropic—only when the copies also make the de­ci­sion does the de­ci­sion con­trol whether you’re a copy.

Foot­note 2: One might com­plain that if you know what you’ll ex­pect in the fu­ture, you should up­date to be­liev­ing that in the pre­sent. But if I’m go­ing to be copied to­mor­row, I don’t ex­pect to be a copy to­day.


The prob­lem of the An­thropic Trilemma is not ac­tu­ally gone, be­cause if I’m in­differ­ent to merg­ing with my copies, there is some pro­ce­dure that bet­ter fulfills my self­ish de­sire to ex­pe­rience win­ning the lot­tery just by shuffling copies of me around: if I win, make a bunch of copies that start from a snap­shot in the past, then merge a the copies to­gether.

So let’s talk about the merg­ing. This is go­ing to be the sec­tion with the un­solved prob­lem.

Here’s what Eliezer’s post says about merg­ing:

Just as com­puter pro­grams or brains can split, they ought to be able to merge. If we imag­ine a ver­sion of the Eb­bo­rian species that com­putes digi­tally, so that the brains re­main syn­chro­nized so long as they go on get­ting the same sen­sory in­puts, then we ought to be able to put two brains back to­gether along the thick­ness, af­ter di­vid­ing them. In the case of com­puter pro­grams, we should be able to perform an op­er­a­tion where we com­pare each two bits in the pro­gram, and if they are the same, copy them, and if they are differ­ent, delete the whole pro­gram. (This seems to es­tab­lish an equal causal de­pen­dency of the fi­nal pro­gram on the two origi­nal pro­grams that went into it. E.g., if you test the causal de­pen­dency via coun­ter­fac­tu­als, then dis­turb­ing any bit of the two origi­nals, re­sults in the fi­nal pro­gram be­ing com­pletely differ­ent (namely deleted).)

In gen­eral, merg­ing copies is some pro­cess where many iden­ti­cal copies go in, and only one comes out. If you know they’re al­most cer­tainly iden­ti­cal, why bother check­ing them, then? Why not just delete all but one? It’s the same pat­tern, af­ter all.

Well, imag­ine that we performed a causal in­ter­ven­tion on one of these iden­ti­cal copies—gave them candy or some­thing. Now if we deleted all but one, the effect of our in­ter­ven­tion is erased with high prob­a­bil­ity. In short, if you delete all but one, the per­son who comes out is not ac­tu­ally the causal de­scen­dant of the copies who go in—it’s just one of the copies.

Just like how “self­ish prefer­ences” means that if I give an­other of your copies candy, that doesn’t fulfill your self­ish de­sire for candy, if an­other of your copies is the one who gets out of the mur­der-cham­ber, that doesn’t fulfill your self­ish de­sire to not get mur­dered. This is why Eliezer talks about go­ing through the pro­cess of com­par­ing each copy bit by bit and only merg­ing them if they’re iden­ti­cal, so that the per­son who comes out is the causal de­scen­dant of all the peo­ple who go in.

On the other hand, Eliezer’s pro­cess is rad­i­cally differ­ent from how things nor­mally go. If I’m one of sev­eral copies, and a causal in­ter­ven­tion gives me candy, and no merg­ing shenani­gans oc­cur, then my causal de­scen­dant is me who’s had some candy. If I’m one of sev­eral copies, and a causal in­ter­ven­tion gives me candy, and then we’re merged by Eliezer’s method, then my causal de­scen­dant is ut­terly an­nihilated.

If we al­low the char­ac­ter of causal ar­rows to mat­ter, and not merely their ex­is­tence, then it’s pos­si­ble that merg­ing is not so neu­tral af­ter all. But this seems like a prefer­ence is­sue in­de­pen­dent of the defi­ni­tion of self­ish prefer­ences—al­though I would have said that about how to weight prefer­ences of mul­ti­ple copies, too, and I would likely have been wrong.

Does the strange be­hav­ior per­mit­ted by the neu­tral­ity of merg­ing serve as a re­duc­tio of that neu­tral­ity, or of this ex­ten­sion of self­ish prefer­ences to an­thropic in­for­ma­tion, or nei­ther? In the im­mor­tal words of Socrates, ”… I drank what?”


A Prob­lem:

This de­ci­sion the­ory has pre­com­mit­ment is­sues. In the case of Jan’s trop­i­cal par­adise, I want to pre­com­mit to cre­at­ing satis­fied copies from a snap­shot of my re­cent self. But once I’m my fu­ture self, I don’t want to do it be­cause I know I’m not a copy.


This de­ci­sion the­ory doesn’t have very many knobs to turn—it boils down to “choose the de­ci­sion-al­gorithm out­put that causes max­i­mum ex­pected util­ity for you, con­di­tion­ing on both the ac­tion and the in­for­ma­tion you pos­sess.” This is some­what good news, be­cause we don’t much want free vari­ables in a de­ci­sion the­ory. But it’s a metaprob­lem be­cause it means that there’s no ob­vi­ous knob to turn to elimi­nate the prob­lem above—cre­ativity is re­quired.

One ap­proach that has worked in the past is to figure out what global vari­able we want to max­i­mize, and just do UDT to this prob­lem. But this doesn’t work for this de­ci­sion the­ory—as we ex­pected, be­cause it doesn’t seem to work for self­ish prefer­ences in gen­eral. The selves at two differ­ent times in the trop­i­cal par­adise prob­lem just want to act self­ishly—so are they al­lowed to be in con­flict?

Solu­tion Brain­storm­ing (if one is needed at all):

One spe­cific ar­gu­ment might run that when you pre­com­mit to cre­at­ing copies, you de­crease your amount of in­dex­i­cal in­for­ma­tion, and that this is just a form of ly­ing to your­self and is there­fore bad. I don’t think this works at all, but it may be worth keep­ing in mind.

A more promis­ing line might be to ex­am­ine the anal­ogy to ev­i­den­tial de­ci­sion the­ory. Ev­i­den­tial de­ci­sion the­ory fails when there’s a differ­ence be­tween con­di­tion­ing on the ac­tion and con­di­tion­ing on a causal do(Ac­tion). What does the analogue look like for an­thropic situ­a­tions?


For some­what of a re­s­olu­tion, see Selfish prefer­ences and self-mod­ifi­ca­tion.