Treating anthropic selfish preferences as an extension of TDT


When preferences are selfless, anthropic problems are easily solved by a change of perspective. For example, if we do a Sleeping Beauty experiment for charity, all Sleeping Beauty has to do is follow the strategy that, from the charity’s perspective, gets them the most money. This turns out to be an easy problem to solve, because the answer doesn’t depend on Sleeping Beauty’s subjective perception.

But selfish preferences—like being at a comfortable temperature, eating a candy bar, or going skydiving—are trickier, because they do rely on the agent’s subjective experience. This trickiness really shines through when there are actions that can change the number of copies. For recent posts about these sorts of situations, see Pallas’ sim game and Jan_Ryzmkowski’s tropical paradise. I’m going to propose a model that makes answering these sorts of questions almost as easy as playing for charity.

To quote Jan’s problem:

It’s a cold cold winter. Radiators are hardly working, but it’s not why you’re sitting so anxiously in your chair. The real reason is that tomorrow is your assigned upload, and you just can’t wait to leave your corporality behind. “Oh, I’m so sick of having a body, especially now. I’m freezing!” you think to yourself, “I wish I were already uploaded and could just pop myself off to a tropical island.”

And now it strikes you. It’s a weird solution, but it feels so appealing. You make a solemn oath (you’d say one in million chance you’d break it), that soon after upload you will simulate this exact scene a thousand times simultaneously and when the clock strikes 11 AM, you’re gonna be transposed to a Hawaiian beach, with a fancy drink in your hand.

It’s 10:59 on the clock. What’s the probability that you’d be in a tropical paradise in one minute?

So question one is the probability question: what’s your probability that you go to the tropical paradise? And question two is the decision problem: is this actually a good idea?

The probability question is straightforward, and is indeed about a 1000/​1001 chance of tropical paradise. If this does not make sense, feel free to ask about it, or go check out these two rambling complementary posts: Deriving probabilities from causal diagrams, More marbles and Sleeping Beauty.

One might then make an argument about the decision question that goes like this: Before I swore this oath, my probability of going to a tropical island was very low. After, it was very high. Since I really like tropical islands, this is a great idea. In a nutshell, I have increased my expected utility by making this oath.

The counterargument is also simple, though: Making copies of myself has no causal effect on me. Swearing this oath does not move my body to a tropical paradise. What really happens is that I just sit there in the cold just the same, but then later I make some simulations where I lie to myself. This is not a higher-utility universe than the one where I don’t swear the oath.

Hopefully you can see how this is confusing.


So, my proposal, in short form: You are a person. I mean this not in the abstract, non-causal, sense, where if I make a copy of you and then shoot you, “you live on.” I mean that the isolated causal agent reading this is a person capable of selfish desires, where if you are one of two copies and I give the other copy a candy bar, your selfish desires for eating candy are unfulfilled1. Choose as if you were controlling the output of your decision algorithm, so that you maximize your expected utility, including selfish desires (if you have them), conditioned on the fact that you exist (I’ll come back to what this last bit means in part III).

This is at its heart porting TDT to anthropic problems. When there is a decision your original body can make that creates a bunch of copies, and the copies are also faced with this decision, your decision lets you control whether you are the original or a copy. If you don’t want to be a copy, as in Pallas’ sim game, you have to take that into account. If you do want to be a copy, you take that into account too.

This leads to biting the bullet in Jan’s tropical paradise. It is actually a good idea to take an action that, if you’re the original body, creates a bunch of high-selfish-expected-utility copies that also undergo the decision you’re making right now, because this decision controls whether you’re one of those copies.

There is an important caveat: this only holds if you truly would like to be one of those copies. To repeat the short form, this decision algorithm assumes that you are a person trying to increase their own expected utility. These copies can’t just be made and disposed of to manipulate your subjective experience—something which is possible, but is a bad idea. The copies have to be people who you would actually like to be, who go on to live long, fulfilling lives. This is not about gaming the system. It’s just an extension of TDT to anthropic situations.

Interestingly, this usually gives the same results as playing for charity. Thus there’s a sort of locality of money, where you make similar tradeoffs between selfish spending and charitable spending no matter how many copies of you there are.

To deliberately construct an intuitive case, imagine that you are already uploaded, and you’re led into a room (a simulated room, of course) where Omega is waiting for you. Omega says hello, and asks you whether you think you’re the original or a copy. “Huh? Have I been copied?” you say. Excellent, Omega says. It then presents two boxes to you, box A and box B. Box A always has some okay candy in it for you to eat (eating candy is my go-to selfish reward). If you pick box B and are the original, it is empty, but you will be copied a million times from a snapshot when you entered the room, and offered the same choice—and if you are a copy, box B contains very delicious candy to eat (and then the copies go on to live similar lives to the original). Again there’s the odd property that the output of your decision algorithm controls whether you are likely a copy or not. If you would prefer to be a copy, then you should pick box B.

There’s a precommitment problem here. Suppose I value my future selves by a sum of their utilities (given some zero point). Then even if being a copy was not so great (like in Pallas’ sim game), I’d precommit to making as many copies as possible. But once the game starts, by my definition of selfish preferences I don’t care much about whether the other copies get a selfish reward, and so I might try to fight that precommitment to raise my expected utility.

In fact, these precommitment problems crop up whenever I calculate expected value in any other way than by averaging utility among future copies. This is a statement about a small piece of population ethics, and as such, should be highly suspect—the fact that my preferred model of selfish preferences says anything about even this small subset of population ethics makes me significantly less confident that I’m right. Even though the thing it’s saying seems sensible.

Footnote 1: The reader who has been following my posts may note how this identification of who has the preferences via causality makes selfish preferences well-defined no matter how many times I define the pattern “I” to map to my brain, which is good because it makes the process well-defined, but also somewhat difficult because it eliminates the last dependence on a lower level where we can think of anthropic probabilities as determined a priori, rather than depending on a definition of self grounded in decision-making as well as experiencing. On the other hand, with that level conflict gone, maybe there’s nothing stopping us from thinking of anthropic probabilities on this more contingent level as “obvious” or “a priori.”


It’s worth bringing up Eliezer’s anthropic trilemma (further thought by Katja Grace here). The idea is to subjectively experience winning the lottery by entering a lottery and then replicating yourself a trillion times, wake up to have the experience, and then merge back together. Thus, the argument goes, as long as probability flows along causal channels, by waking up a trillion times I have captured the subjective experience, and will go on to subjectively experience winning the lottery.

Again we can ask the two questions: What are the probabilities? And is this actually a good idea?

This is the part where I come back to explain that earlier terminology—why is it important that I specified that you condition on your own existence? When you condition on the fact that you exist, you get an anthropic probability. In the story about Omega I told above, your probability that you’re the original before you enter the room is 1. But after you enter the room, if your decision algorithm chooses box B, your probability that you’re the original should go down to one in a million. This update is possible because you’re updating on new information about where you are in the game—you’re conditioning on your own existence.

Note that I did not just say “use anthropic probabilities.” When calculating expected utility, you condition on your own existence, but you most certainly do not condition on future selves’ existence. After all, you might get hit by a meteor and die, so you don’t actually know that you’ll be around tomorrow, and you shouldn’t condition on things you don’t know. Thus the player at russian roulette who says “It’s okay, I’ll subjectively experience winning!” is making a decision by conditioning on information they do not have.

Katja Grace talks about two principles acting in the Anthropic Trilemma: Follow The Crowd, which sends your subjective experience into the branch with more people, and Blatantly Obvious Principle, which says that your subjective experience should follow causal paths. Katja points out that they do not just cause problems when merging, they also conflict when splitting—so Eliezer is being selective in applying these principles, and there’s a deeper problem here. If you recall me mentioning my two-fluid model of anthropics, I partially resolved this by tracking two measures, one that obeyed FTC (subjective probability), and one that obeyed BOP (magic reality fluid).

But the model I’m presenting here dissolves those fluids (or would it be ‘dilutes’?) - the thing that follows the crowd is who you think you are, and the blatantly obvious thing is your expectation for the future. There’s no subjective experience fluid that it’s possible to push around without changing the physical state of the universe. There’s just people.

To give the probabilities in the Anthropic Trilemma, it is important to track what information you’re conditioning on. If I condition on my existence just after I buy my ticket, my probability that I picked the winning numbers is small, no matter what anthropic hijinks might happen if I win, I still expect to see those hijinks happen with low probability2. If I condition on the fact that I wake up after possibly being copied, my probability that I picked the winning numbers is large, as is my probability that I will have picked the winning numbers in the future, even if I get copied or merged or what have you. Then I learn the result, and no longer have a single state of information which would give me a probability distribution. Compare this to the second horn of the trilemma; it’s easy to get mixed up when giving probabilities if there’s more than one set of probabilities to give.

Okay, so that’s the probabilities—but is this actually a good idea? Suppose I’m just in it for the money. So I’m standing there considering whether to buy a ticket, and I condition on my own existence, and the chances of winning still look small, and so I don’t buy the ticket. That’s it. This is especially clear if I donate my winnings to charity—the only winning move is not to play the lottery.

Suppose then instead that I have a selfish desire to experience winning the lottery, independent of the money—does copying myself if I win help fulfill this desire? Or to put this another way, in calculating expected utility we weight the selfish utility of the many winning copies less because winning is unlikely, but do we weight it more because there are more of them?

This question is resolved by (possible warning sign) the almost-population-ethics result above, which says that as an attractor of self-modification we should average copies’ utilities rather than summing them, and so copying does not increase expected utility. Again, I find this incompletely convincing, but it does seem to be the extension of TDT here. So this procedure does not bite the bullet in the anthropic trilemma. But remember the behavior in Jan’s tropical paradise game? It is in fact possible to design a procedure that lets you satisfy your desire to win the lottery—just have the copies created when you win start from a snapshot of yourself before you bought the lottery ticket.

This is a weird bullet to bite. It’s like, how come it’s a good idea to create copies that go through the decision to create copies, but only a neutral idea to create copies that don’t? After all, winning and then creating simulations has the same low chance no matter what. The difference is entirely anthropic—only when the copies also make the decision does the decision control whether you’re a copy.

Footnote 2: One might complain that if you know what you’ll expect in the future, you should update to believing that in the present. But if I’m going to be copied tomorrow, I don’t expect to be a copy today.


The problem of the Anthropic Trilemma is not actually gone, because if I’m indifferent to merging with my copies, there is some procedure that better fulfills my selfish desire to experience winning the lottery just by shuffling copies of me around: if I win, make a bunch of copies that start from a snapshot in the past, then merge a the copies together.

So let’s talk about the merging. This is going to be the section with the unsolved problem.

Here’s what Eliezer’s post says about merging:

Just as computer programs or brains can split, they ought to be able to merge. If we imagine a version of the Ebborian species that computes digitally, so that the brains remain synchronized so long as they go on getting the same sensory inputs, then we ought to be able to put two brains back together along the thickness, after dividing them. In the case of computer programs, we should be able to perform an operation where we compare each two bits in the program, and if they are the same, copy them, and if they are different, delete the whole program. (This seems to establish an equal causal dependency of the final program on the two original programs that went into it. E.g., if you test the causal dependency via counterfactuals, then disturbing any bit of the two originals, results in the final program being completely different (namely deleted).)

In general, merging copies is some process where many identical copies go in, and only one comes out. If you know they’re almost certainly identical, why bother checking them, then? Why not just delete all but one? It’s the same pattern, after all.

Well, imagine that we performed a causal intervention on one of these identical copies—gave them candy or something. Now if we deleted all but one, the effect of our intervention is erased with high probability. In short, if you delete all but one, the person who comes out is not actually the causal descendant of the copies who go in—it’s just one of the copies.

Just like how “selfish preferences” means that if I give another of your copies candy, that doesn’t fulfill your selfish desire for candy, if another of your copies is the one who gets out of the murder-chamber, that doesn’t fulfill your selfish desire to not get murdered. This is why Eliezer talks about going through the process of comparing each copy bit by bit and only merging them if they’re identical, so that the person who comes out is the causal descendant of all the people who go in.

On the other hand, Eliezer’s process is radically different from how things normally go. If I’m one of several copies, and a causal intervention gives me candy, and no merging shenanigans occur, then my causal descendant is me who’s had some candy. If I’m one of several copies, and a causal intervention gives me candy, and then we’re merged by Eliezer’s method, then my causal descendant is utterly annihilated.

If we allow the character of causal arrows to matter, and not merely their existence, then it’s possible that merging is not so neutral after all. But this seems like a preference issue independent of the definition of selfish preferences—although I would have said that about how to weight preferences of multiple copies, too, and I would likely have been wrong.

Does the strange behavior permitted by the neutrality of merging serve as a reductio of that neutrality, or of this extension of selfish preferences to anthropic information, or neither? In the immortal words of Socrates, ”… I drank what?”


A Problem:

This decision theory has precommitment issues. In the case of Jan’s tropical paradise, I want to precommit to creating satisfied copies from a snapshot of my recent self. But once I’m my future self, I don’t want to do it because I know I’m not a copy.


This decision theory doesn’t have very many knobs to turn—it boils down to “choose the decision-algorithm output that causes maximum expected utility for you, conditioning on both the action and the information you possess.” This is somewhat good news, because we don’t much want free variables in a decision theory. But it’s a metaproblem because it means that there’s no obvious knob to turn to eliminate the problem above—creativity is required.

One approach that has worked in the past is to figure out what global variable we want to maximize, and just do UDT to this problem. But this doesn’t work for this decision theory—as we expected, because it doesn’t seem to work for selfish preferences in general. The selves at two different times in the tropical paradise problem just want to act selfishly—so are they allowed to be in conflict?

Solution Brainstorming (if one is needed at all):

One specific argument might run that when you precommit to creating copies, you decrease your amount of indexical information, and that this is just a form of lying to yourself and is therefore bad. I don’t think this works at all, but it may be worth keeping in mind.

A more promising line might be to examine the analogy to evidential decision theory. Evidential decision theory fails when there’s a difference between conditioning on the action and conditioning on a causal do(Action). What does the analogue look like for anthropic situations?


For somewhat of a resolution, see Selfish preferences and self-modification.