Anthropic decision theory for selfish agents
Consider Nick Bostrom’s Incubator Gedankenexperiment, phrased as a decision problem. In my mind, this provides the purest and simplest example of a non-trivial anthropic decision problem. In an otherwise empty world, the Incubator flips a coin. If the coin comes up heads, it creates one human, while if the coin comes up tails, it creates two humans. Each created human is put into one of two indistinguishable cells, and there’s no way for created humans to tell whether another human has been created or not. Each created human is offered the possibility to buy a lottery ticket which pays 1$ if the coin has shown tails. What is the maximal price that you would pay for such a lottery ticket? (Utility is proportional to Dollars.) The two traditional answers are 1/2$ and 2/3$.
We can try to answer this question for agents with different utility functions: total utilitarians; average utilitarians; and selfish agents. UDT’s answer is that total utilitarians should pay up to 2/3$, while average utilitarians should pay up to 1/2$; see Stuart Armstrong’s paper and Wei Dai’s comment. There are some heuristic ways to arrive at UDT prescpriptions, such as asking “What would I have precommited to?” or arguing based on reflective consistency. For example, a CDT agent that expects to face Counterfactual Mugging-like situations in the future (with predictions also made in the future) will self-modify to become an UDT agent, i.e., one that pays the counterfactual mugger.
Now, these kinds of heuristics are not applicable to the Incubator case. It is meaningless to ask “What maximal price should I have precommited to?” or “At what odds should I bet on coin flips of this kind in the future?”, since the very point of the Gedankenexperiment is that the agent’s existence is contingent upon the outcome of the coin flip. Can we come up with a different heuristic that leads to the correct answer? Imagine that the Incubator’s subroutine that is responsible for creating the humans is completely benevolent towards them (let’s call this the “Benevolent Creator”). (We assume here that the humans’ goals are identical, such that the notion of benevolence towards all humans is completely unproblematic.) The Benevolent Creator has the power to program a certain maximal price the humans pay for the lottery tickets into them. A moment’s thought shows that this leads indeed to UDT’s answers for average and total utilitarians. For example, consider the case of total utilitarians. If the humans pay x$ for the lottery tickets, the expected utility is 1/2*(-x) + 1/2*2*(1-x). So indeed, the break-even price is reached for x=2/3.
But what about selfish agents? For them, the Benevolent Creator heuristic is no longer applicable. Since the humans’ goals do not align, the Creator cannot share them. As Wei Dai writes, the notion of selfish values does not fit well with UDT. In Anthropic decision theory, Stuart Armstrong argues that selfish agents should pay up to 1/2$ (Sec. 3.3.3). His argument is based on an alleged isomorphism between the average utilitarian and the selfish case. (For instance, donating 1$ to each human increases utility by 1 for both average utilitarian and selfish agents, while it increases utility by 2 for total utilitarians in the tails world.) Here, I want to argue that this is incorrect and that selfish agents should pay up to 2/3$ for the lottery tickets.
(Needless to say that all the bold statements I’m about to make are based on an “inside view”. An “outside view” tells me that Stuart Armstrong has thought much more carefully about these issues than I have, and has discussed them with a lot of smart people, which I haven’t, so chances are my arguments are flawed somehow.)
In order to make my argument, I want to introduce yet another heuristic, which I call the Submissive Gnome. Suppose each cell contains a gnome which is already present before the coin is flipped. As soon as it sees a human in its cell, it instantly adopts the human’s goal. From the gnome’s perspective, SIA odds are clearly correct: Since a human is twice as likely to appear in the gnome’s cell if the coin shows tails, Bayes’ Theorem implies that the probability of tails is 2⁄3 from the gnome’s perspective once it has seen a human. Therefore, the gnome would advise the selfish human to pay up to 2/3$ for a lottery ticket that pays 1$ in the tails world. I don’t see any reason why the selfish agent shouldn’t follow the gnome’s advice. From the gnome’s perspective, the problem is not even “anthropic” in any sense, there’s just straightforward Bayesian updating.
Suppose we want to use the Submissive Gnome heuristic to solve the problem for utilitarian agents. (ETA: Total/average utilitarianism includes the well-being and population of humans only, not of gnomes.) The gnome reasons as follows: “With probability 2⁄3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.” The gnome’s advice disagrees with UDT and the solution based on the Benevolent Creator. Something has gone terribly wrong here, but what? The mistake in the gnome’s reasoning here is in fact perfectly isomorphic to the mistake in the reasoning leading to the “yea” answer in Psy-Kosh’s non-anthropic problem.
Things become clear if we look at the problem from the gnome’s perspective before the coin is flipped. Assume, for simplicity, that there are only two cells and gnomes, 1 and 2. If the coin shows heads, the single human is placed in cell 1 and cell 2 is left empty. Since the humans don’t know in which cell they are, neither should the gnomes know. So from each gnome’s perspective, there are four equiprobable “worlds”: it can be in cell 1 or 2 and the coin flip can result in heads or tails. We assume, of course, that the two gnomes are, like the humans, sufficiently similar such that their decisions are “linked”.
We can assume that the gnomes already know what utility functions the humans are going to have. If the humans will be (total/average) utilitarians, we can then even assume that the gnomes already are so, too, since the well-being of each human is as important as that of any other. Crucially, then, for both utilitarian utility functions, the question whether the gnome is in cell 1 or 2 is irrelevant. There is just one “gnome advice” that is given identically to all (one or two) humans. Whether this advice is given by one gnome or the other or both of them is irrelevant from both gnomes’ perspective. The alignment of the humans’ goals leads to alignment of the gnomes’ goals. The expected utility of some advice can simply be calculated by taking probability 1⁄2 for both heads and tails, and introducing a factor of 2 in the total utilitarian case, leading to the answers 1⁄2 and 2⁄3, in accordance with UDT and the Benevolent Creator.
The situation looks different if the humans are selfish. We can no longer assume that the gnomes already have a utility function. The gnome cannot yet care about that human, since with probability 1⁄4 (if the gnome is in cell 2 and the coin shows heads) there will not be a human to care for. (By contrast, it is already possible to care about the average utility of all humans there will be, which is where the alleged isomorphism between the two cases breaks down.) It is still true that there is just one “gnome advice” that is given identically to all (one or two) humans, but the method for calculating the optimal advice now differs. In three of the four equiprobable “worlds” the gnome can live in, a human will appear in its cell after the coin flip. Two out of these three are tail worlds, so the gnome decides to advise paying up to 2/3$ for the lottery ticket if a human appears in its cell.
There is a way to restore the equivalence between the average utilitarian and the selfish case. If the humans will be selfish, we can say that the gnome cares about the average well-being of the three humans which will appear in its cell with equal likelihood: the human created after heads, the first human created after tails, and the second human created after tails. The gnome expects to adopt each of these three humans’ selfish utility function with probability 1⁄4. It makes thus sense to say that the gnome cares about the average well-being of these three humans. This is the correct correspondence between selfish and average utilitarian values and it leads, again, to the conclusion that the correct advise is to pay up to 2/3$ for the lottery ticket.
In Anthropic Bias, Nick Bostrom argues that each human should assign probability 1⁄2 to the coin having shown tails (“SSA odds”). He also introduces the possible answer 2⁄3 (“SSA+SIA”, nowadays usually simply called “SIA”) and refutes it. SIA odds have been defended by Olum. The main argument against SIA is the Presumptuous Philosopher. Main arguments for SIA and against SSA odds are that SIA avoids the Doomsday Argument1, which most people feel has to be wrong, that SSA odds depend on whom you consider to be part of your “reference class”, and furthermore, as pointed out by Bostrom himself, that SSA odds allow for acausal superpowers.
The consensus view on LW seems to be that much of the SSA vs. SIA debate is confused and due to discussing probabilities detached from decision problems of agents with specific utility functions. (ETA: At least this was the impression I got. Two commenters have expressed scepticism about whether this is really the consensus view.) I think that “What are the odds at which a selfish agent should bet on tails?” is the most sensible translation of “What is the probability that the coin has shown tails?” into a decision problem. Since I’ve argued that selfish agents should take bets following SIA odds, one can employ the Presumptuous Philosopher argument against my conclusion: it seems to imply that selfish agents, like total but unlike average utilitarians, should bet at extreme odds on living in a extremely large universe, even if there’s no empirical evidence in favor of this. I don’t think this counterargument is very strong. However, since this post is already quite lengthy, I’ll elaborate more on this if I get encouraging feedback for this post.
1 At least its standard version. SIA comes with its own Doomsday conclusions, cf. Katja Grace’s thesis Anthropic Reasoning in the Great Filter.