Why Bayesians should two-box in a one-shot

Consider Newcomb’s problem.

Let ‘general’ be the claim that Omega is always right.

Let ‘instance’ be the claim that Omega is right about a particular prediction.

Assume you, the player, are not told the rules of the game until after Omega has made its prediction.

Consider 2 variants of Newcomb’s problem.

1. Omega is a perfect predictor. In this variant, you assign a prior of 1 to P(general). You are then obligated to believe that Omega has correctly predicted your action. In this case Eliezer’s conclusion is correct, and you should one-box. It’s still unclear whether you have free will, and hence have any choice in what you do next, but you can’t lose by one-boxing.

But you can’t assign a prior of 1 to P(general), because you’re a Bayesian. You derive your prior for P(general) from the (finite) empirical data. Say you begin with a prior of 0.5 before considering any observations. Then you observe all of Omega’s N predictions, and each time, Omega gets it right, and you update:

P(general | instance) = P(instance | general) P(instance) /​ P(general)
= P(instance) /​ P(general)

Omega would need to make an infinite number of correct predictions before you could assign a prior of 1 to P(general). So this case is theoretically impossible, and should not be considered.

2. Omega is a “nearly perfect” predictor. You assign P(general) a value very, very close to 1. You must, however, do the math and try to compare the expected payoffs, at least in an order-of-magnitude way, and not just use verbal reasoning as if we were medieval scholastics.

The argument for two-boxing is that your action now can’t affect what Omega did in the past. That is, we are using a model which includes not just P(instance | general), but also the interaction of your action, the contents of the boxes, and the claim that Omega cannot violate causality. P ( P($1M box is empty | you one-box) = P($1M box is empty | you two-box) ) >= P(Omega cannot violate causality), and that needs to be entered into the computation.

Numerically, two-boxers claim that the high probability they assign to our understanding of causality being basically correct more than cancels out the high probability of Omega being correct.

The argument for one-boxing is that you aren’t entirely sure you understand physics, but you know Omega has a really good track record—so good that it is more likely that your understanding of physics is false than that you can falsify Omega’s prediction. This is a strict reliance on empirical observations as opposed to abstract reason: count up how often Omega has been right and compute a prior.

However, if we’re going to be strict empiricists, we should double down on that, and set our prior on P(cannot violate causality) strictly empirically—based on all observations regarding whether or not things in the present can affect things in the past.

This includes up to every particle interaction in our observable universe. The number is not so high as that, as probably a large number of interactions could occur in which the future affects the past without our noticing. But the number of observations any one person has made in which events in the future seem to have failed to affect events in the present is certainly very large, and the accumulated wisdom of the entire human race on the issue must provide more bits in favor of the hypothesis that causality can’t be violated, than the bits for Omega’s infallibility based on the comparatively paltry number of observations of Omega’s predictions, unless Omega is very busy indeed. And even if Omega has somehow made enough observations, most of them are as inaccessible to you as observations of the laws of causality working on the dark side of the moon. You, personally, cannot have observed Omega make more correct predictions than the number of events you have observed in which the future failed to affect the present.

You could compute a new payoff matrix that made it rational to one-box, but the ratio between the payoffs would need to be many orders of magnitude higher. You’d have to compute it in utilons rather than dollars, because the utility of dollars doesn’t scale linearly. And that means you’d run into the problem that humans have some upper bound on utility—they aren’t cognitively complex enough to achieve utility levels 10^10 times greater than “won $1,000”. So it still might not be rational to one-box, because the utility payoff under the one box might need to be larger than you, as a human, could experience.


The case in which you get to think about what to do before Omega studies you and makes its decision is more complicated, because your probability calculation then also depends on what you think you would have done before Omega made its decision. This only affects the partition of your probability calculation in which Omega can alter the past, however, so numerically it doesn’t make a big difference.

The trick here is that most statements of Newcomb’s are ambiguous as to whether you are told the rules before Omega studies you, and as to which decision they’re asking you about when they ask if you one-box or two-box. Are they asking about what you pre-commit to, or what you eventually do? These decisions are separate, but not isolatable.

As long as we focus on the single decision at the point of action, then the analysis above (modified as just mentioned) still follows. If we ask what the player should plan to do before Omega makes its decision, then the question is just whether you have a good enough poker face to fool Omega. Here it takes no causality violation for Omega to fill the boxes in accordance with your plans, so that factor does not enter in, and you should plan to one-box.

If you are a deterministic AI, that implies that you will one-box. If you’re a GOFAI built according to the old-fashioned symbolic logic AI designs talked about on LW (which, BTW, don’t work), it implies you will probably one-box even if you’re not deterministic, as otherwise you would need to be inconsistent, which is not allowed with GOFAI architectures. If you’re a human, you’d theoretically be better off if you could suddenly see things differently when it’s time to choose boxes, but that’s not psychologically plausible. In no case is there a paradox, or any real difficulty to the decision to one-box.

Iterated Games

Everything changes with iterated interactions. It’s useful to develop a reputation for one-boxing, because this may convince people that you will keep your word even when it seems disadvantageous to you. It’s useful to convince people that you would one-box, and it’s even beneficial, in certain respects, to spread the false belief in the Bayesian community that Bayesians should one-box.

Read Eliezer’s post carefully, and I think you’ll agree that the reasoning Eliezer gives for one-boxing is not that it is the rational solution to a one-off game—it’s that it’s a winning policy to be the kind of person who one-boxes. That’s not an argument that the payoff matrix of an instantaneous decision favors one-boxing; it’s an argument for a LessWrongian morality. It’s the same basic argument as that honoring commitments is a good long-term strategy. But the way Eliezer stated it has given many people the false impression that one-boxing is actually the rational choice in an instantaneous one-shot game (and that’s the only interpretation which would make it interesting).

The one-boxing argument is so appealing because it offers a solution to difficult coordination problems. It makes it appear that rational altruism and a rational utopia are within our reach.

But this is wishful thinking, not math, and I believe that the social norm of doing the math is even more important than a social norm of one-boxing.