EDT with updating double counts

I recently got confused thinking about the following case:

Calculator bet: I am offered the opportunity to bet on a mathematical statement X to which I initially assign 50% probability (perhaps X = 139926 is a quadratic residue modulo 314159). I have access to a calculator that is 99% reliable, i.e. it corrupts the answer 1% of the time at random. The calculator says that X is true. With what probability should I be willing to wager?

I think the answer is clearly “99%.” But a naive application of EDT can recommend betting with 99.99% probability. I think this is a mistake, and understanding the mistake helps clarify what it means to be “updateless” and why it’s essentially obligatory for EDT agents. My takeaway is that for an EDT agent, bayesian updating is a description of the expected utility calculation rather than something that EDT agent should do to form its beliefs before calculating expected utility.

Thanks to Joe Carlsmith and Katja Grace for the conversation that prompted this post. I suspect this point is well-known in the philosophy literature. I’ve seen related issues discussed in the rationalist community, especially in this sequence and this post but found those a bit confusing—in particular, I think I initially glossed over how “SSA” was being used to refer to a view which rejects bayesian updating on observations (!) in this comment and the linked paper. In general I’ve absorbed the idea that decision theory and anthropics had a weird interaction, but hadn’t noticed that exactly the same weirdness also applied in cases where the number of observers is constant across possible worlds.

Why EDT bets at 99.99% odds (under some conditions)

I’ll make four assumptions:

I have impartial values. Perhaps I’m making a wager where I can either make 1 person happy or 99 people happy—I just care about the total amount of happiness, not whether I am responsible for it. I’ll still describe the payoffs of the bets in $, but imagine that utility is a linear function of total $ earned by all copies of me.
We live in a very big universe where many copies of me all face the exact same decision. This seems plausible for a variety of reasons; the best one is accepting an interpretation of quantum mechanics without collapse (a popular view).
I handle logical uncertainty in the same way I handle empirical uncertainty. You could construct a similar case to the calculator bet using logical uncertainty, but the correlation across possible copies of me is clearest if I take a logical fact.
I form my beliefs E by updating on my observations. Then after updating I consider E[utility|I take action a] and E[utility|I take action a’] and choose the action with higher expected utility.

Under these assumptions, what happens if someone offers me a bet of $1 at 99.9% odds? If I take the bet I’ll gain $1 if X is true, but lose $1000 if X turns out to be false? Intuitively this is a very bad bet, because I “should” only have 99% confidence. But under these assumptions EDT thinks it’s a great deal.

To calculate utility, I need to sum up over a bunch of copies of me.
- Let N be the number of copies of me in the universe who are faced with this exact opportunity to bet decision.
- My decision is identical to the other copies of me who also observed their calculator say “X is true”.
- My decision may also be correlated with copies of me who made a different observation, or with totally different people doing totally different things, but those don’t change the bottom line and I’ll ignore them to keep life simple.
- So I’ll evaluate the total money earned by people who saw their calculator say “X is true” and whose decision is perfectly correlated with mine.
To calculate utility, I calculate the probability of X and then calculate expected utility
- First I update on the fact that my calculator says X is true. This observation has probability 99% if X is true and 1% if X is false. The prior probability of X was 50%, so the posterior probability is 99%.
- My utility is the total amount of money made by all N copies of me, averaged over the world where X is true (with 99% weight) and the world where X is false (with 1% weight)
So to calculate the utility conditioned on taking the bet, I ask two questions:
- Suppose that X is true, and I decide to take the bet. What is my utility then?
  If X is true, there are 0.99 N copies of me who all saw their calculator correctly say “X is true.” So I get $0.99 N
- Suppose that X is false, and I decide to take the bet. What is my utility then?
  If X is false, then there are 0.01N copies of me who saw their calculator incorrectly say “X is true.” So I lose $1000 * 0.01N = $10N.
- I think that there’s a 99% probability that X is true, so my expected utility is 99% x $0.99N – 1% x $10N = $0.88N.
If I don’t take the bet, none of my copies win or lose any money. So we get $0 utility, which is much worse than $0.88N.
Therefore I take the bet without thinking twice.

Failure diagnosis

Intuitively 99.99% is the wrong answer to this question. But it’s important to understand what actually went wrong. After all, intuitions could be mistaken and maybe big universes lead to weird conclusions (I endorse a few of those myself). Moreover, if you’re like me and think the “obvious” argument for EDT is compelling, this case might lead you to suspect something has gone wrong in your reasoning.

The intuitive problem is that we are “updating” on the calculator’s verdict twice:

First when we form our beliefs about whether X is true.
Second when we ask “If X is true, how many copies of me would have made the current observations, and therefore make a decision correlated with my own?”

The second “update” is pretty much inherent in the nature of EDT—if I care about the aggregate fate of all of the people like me, and if all of their decisions are correlated with mine, then I need to perform a sum over all of them and so I will care twice as much about possible worlds where there are twice as many of them. Rejecting this “update” basically means rejecting EDT.

The first “update” looks solid at first, since Bayesian updating given evidence seems like a really solid epistemic principle. But I claim this is actually where we ran into trouble. In my view there is an excellent simple argument for using EDT to make decisions, but there is no good argument for using beliefs formed by condition on your observations as the input into EDT.

This may sound a bit wild, but hear me out. The basic justification for updating is essentially decision-theoretic—either it’s about counting the observers across possible worlds who would have made your observations, or it’s about dutch book arguments constraining the probabilities with which you should bet. (As an example, see SEP on bayesian epistemology.) I’ve internalized these arguments enough that it can feel like a primitive bedrock of epistemology, but really they only really constrain how you should bet (or maybe what “you” should expect to see next)—they don’t say much about what you should “expect” in any observer-independent sense that would be relevant to a utility calculation for an impartial actor.

If you are an EDT agent, the right way to understand discussions of “updating” is as a description of the calculation done by EDT. Indeed, it’s common to use the word “belief” to refer to the odds at which you’d bet, in which case beliefs are the output of EDT rather than the input. Other epistemological principles do help constrain the input to EDT (e.g. principles about simplicity or parsimony or whatever), but not updating.

This is similar to the way that an EDT agent sees causal relationships: as helpful descriptions of what happens inside normatively correct decision making. Updating and causality may play a critical role in algorithms that implement normatively correct decision making, but they are not inputs into normatively correct decision making. Intuitions and classical arguments about the relevance of these concepts can be understood as what those algorithms feel like from the inside, as agents who have evolved to implement (rather than reason about) correct decision-making.

“Updatelessness” as a feature of preferences

On this perspective whether to be “updateless” isn’t really a free parameter in EDT—there is only one reasonable theory, which is to use the prior probabilities to evaluate conditional utilities given each possible decision that an agent with your nature and observations could make.

So what are we to make of cases like transparent newcomb that appear to separate EDT from UDT?

I currently think of this as a question of values or identity (though I think this is dicier than the earlier part of the post). Consider the following pair of cases to illustrate:

I am split into two copies A and B who will go on to live separate lives in separate (but otherwise identical) worlds. There is a button in front of each copy. If copy A presses the button, they will lose $1 and copy B will gain $2. If copy B presses the button, nothing happens. In this case, all versions of EDT will press the button. In some sense at this point the two copies must care about each other, since they don’t even know which one they are, and so the $1 of loss and $2 of gain can be compared directly.
But now suppose that copy A sees the letter “A” and copy B sees the letter “B.” Now no one cares what I do after seeing “B,” and if I see “A” the entire question is whether I care what happens to the other copy. The “updateless” answer is to care about all the copies of yourself who made different observations. The normal “selfish” answer is to care about only the copy of yourself who has made the same observations.

This framing makes it clear and relatively uninteresting why you should modify yourself to be updateless: any pair of agents agents could benefit from a bilateral commitment to value each other’s welfare. It’s just that A and B start off being the same, and so they happen to be in an exceptionally good position to make such a commitment, and it’s very clear what the “fair” agreement is.

What if the agents aren’t selfish? Say they both just want to maximize happiness?

If both agents exist and they are just in separate worlds, then there is no conflict between their values at all, and they always push the button.
Suppose that only one agent exists. Then it feels weird, seeing button “B,” to press the button knowing that it causes you to lose $1 in the real, actually-existing world. But in this case I think the problem comes from the sketchy way we’re using the word “exist”—if copy B gets money based on copy A’s decision, then in what sense exactly does copy A “not exist”? What are we to make of the version of copy A who is doing the same reasoning, and is apparently wrong about whether or not they exist? I think these cases are confusing from a misuse of “existence” as a concept rather than updatelessness per se.