# Pure Bayesianism vs Newcomb’s Paradox

In this article, I analyze Newcomb’s paradox with *pure Bayesianism*. The focus will be on the comparison of counterfactual expected gains and of the **1-boxer** and **2-boxer** strategies, using only the laws of probability. The main trick of the analysis is a clear separation between agents’ epistemic thinking (assumed to be *pure Bayesianism*) and their decision-making processes.

I prove that *engaging* in counterfactual optimization implies **2-boxing**. Weirdly enough, though, I also show the player *can* believe that , if the player thinks that the predictor **Omega** knows sufficiently more about the player’s decision-making algorithm than the player does themself. In such a case, counterfactual optimization would favor **1-boxing**. However, even in this case, as we will explain in greater details, this does not mean that the player should then *engage* in counterfactual reasoning.

I will then make a few remarks about non-Bayesians, before concluding.

# A slightly generalized Newcomb’s paradox

Newcomb’s paradox is a classical paradox of decision theory, which has been widely discussed by philosophers, as well as here on LessWrong. Evidently, I have not dug through the entire (massive!) literature on the problem, so I apologize if the analysis below has been done elsewhere.

Let’s start by describing the problem. Two boxes A and B are presented to Alice.

**Box A**is opaque. Its content is unknown to**Alice**. But**Alice**knows that an algorithm called**Omega***already*decided what amount to put in, according to principles that we shall detail later on. The content of**Box A**can no longer be changed.**Box B**is transparent and contains $1,000.

Alice must choose between the **1-boxer** strategy and the **2-boxer** strategies.

The

**1-boxer**only takes**Box A**.The

**2-boxer**takes*both***Box A***and***Box B**.

What makes Newcomb’s paradox interesting is the way **Omega** decides what to put in **Box A**. Essentially, **Omega** will put a large amount of money in **Box A **if it predicts the player to choose the **1-boxer** strategy.

To make it realistic, here, we typically assume that **Omega** gathered a huge amount of data on **Alice** like, say, her entire Google, Facebook and Amazon data. Based on this and all sorts of other data, such as this online discussion about Newcomb’s paradox, **Omega** made a probabilistic guess about what strategy **Alice** will choose. In this article, we’ll assume that **Omega** is a *pure Bayesian*, in the sense that the probabilistic guess is derived from the laws of probability.

Denote the probability estimated by **Omega** that **Alice** will follow the **1-boxer** strategy. Here, we assume that **Omega** flips a biased coin, so that, with probability , it puts $1,000,000 in **Box A**. Otherwise, **Omega** leaves **Box A** empty.

(The classical version of Newcomb’s paradox is the special case where the probability always either equals 0 or 1)

**Alice** knows all of this. The two boxes are in front of her. She can either take take **Box A **only (**1-boxing)**, or **Alice** can take *both* **Box A** and **Box B **(**2-boxing**).

Imagine you’re **Alice**. What do you do?

Go.

# Newcomb as a counterfactual problem

Hopefully, you’ve already had endless debates about this famous paradox, including with people you disagree with. Essentially, the **1-boxer** will say that their expected gain is larger than the **2-boxer’s**. The **2-boxer** will respond that what matters is that they got $1,000 more than what they would have got, had they taken only one box.

It turns out that both the **1-boxer** and the **2-boxer** are engaging in counterfactual reasoning. In other words, they compare their expected gain as they follow their strategy, to what they would have got in expectation by following the opposite strategy. In probabilistic terms, both compared to , where is the content of **Box A**, is the content of **Box B**, 1 corresponds to the **1-boxer** strategy, and 2 corresponds to the **2-boxer** strategy.

Evidently, the crux of the problem is the uncertainty on the content of **Box A**, which affects the probability computation. The **1-boxer** claims that which leads them to conclude that . However, the **2-boxer** argues that , which leads to the conclusion .

So which is it? Who’s right?

# Pure Bayesianism*

In causal decision theory, it is common to consider expectations and , where the do-operator consists of forcing the probability distribution over all variables that are not “caused” by event 1 and 2 to remain the same. Since the content of the box is usually considered *not* to be “caused” by event 1 and 2, it is common in causal decision theory to consider that , in which case . It is thus common in causal decision theory to argue that the **2-boxer **strategy is the better strategy.

Annoyingly, causal decision theory does have “bugs”. In particular, it is arguably severely “model-dependent”, in the sense that our choice of the causal model affects what the do-operator really means. Thus, two causal decision theorists may still disagree, because they choose to model causality differently — this “bug” applies to “causality” more generally. One way to fix this would be to consider a universal prior over all causal models. But then, one might wonder why we dogmatically chose to restrict ourselves to causal models only, thereby steering us away from the universal prior over all computable theories that Solomonoff created for us.

In any case, by adding the do-operator, causal decision theory is **not** *purely* Bayesian; it is *adding* an axiom to the laws of probability. But this is not specific to causal decision theory. More generally, the trouble is that *pure Bayesianism* does not model decision-making. Granted, it may seem “natural” to make decisions based on the maximization of expected future rewards, conditioned on all observed data. But it should be stressed that this is *one* way of making decisions, which is no way derived from the laws of probability themselves. So to be sure we’re going for *pure Bayesianism*, let’s be agnostic to decision-making processes.

This means in particular that we will clearly separate an individual’s epistemic thinking from its decision-making process. The epistemic thinking computes how the individual is viewing the world. It is the individual’s best attempt to describe the world as it is, including objects in the world such as the individual themself. In this article, we will assume that both **Alice** and **Omega** will rely on *pure Bayesianism* to do so.

What I mean by *pure Bayesianism* here can be *Solomonoff’s induction*, which runs on a universal prior over all *computable probability distributions over data. *But I won’t need the full artillery of Solomonoff. Let’s just consider that any prediction is derived from applying solely the laws of probability with some* *initial prior on any observable data.

Note that this means that we are ignoring computational complexity constraints, at least for now. We will come back to this point later on. Arguably, what makes this problem interesting in practice is precisely the fact that Bayesian computations are intractable (if not uncomputable!). But for now, we will simply assume that both **Alice **and **Omega** have infinite computational power, which allows them to run any terminating algorithm instantly, and to immediately identify algorithms that never halt as such.

Now, **Omega’s** decision-making process will exploit its epistemic thinking, as has been described above. Namely, it will decide to put $1M in **Box A** with probability equal to its prediction that **Alice **will choose the **1-boxer** strategy.

However, we will not assume anything about **Alice’s** decision-making process. In fact, **Alice’s** actual decision-making algorithm may or may not exploit her epistemic computations. Typically, if **Alice** pre-commits to **1-boxing**, then her decision-making algorithm will not be exploiting her epistemic thinking.

# The zero-surprise theorem

Clearly, if** Alice’s **epistemic system perfectly knows her decision-making algorithm, then this epistemic system can predict the decisions she will make. Indeed, it can simply run the decision-making algorithm to derive a deterministic prediction of **Alice’s** decision. But this has a weird consequence. Indeed, given all of her past data, * Alice essentially cannot be surprised by her decisions *given these data!

The reason why this remark is critical is because of an easy Bayesian theorem, which I’ll call the zero-surprise theorem.

Theorem (zero surprise).If a new piece of data is unsurprising to a Bayesian, then this Bayesian does not change her belief by learning . More precisely, for any theory , if , then .

Proof.This follows straightforwardly from Bayes rule, after noting that, for , implies .

Thus, if **Alice’s **decision-making fully follows from her epistemic beliefs, then she would believe , for either (**1-boxer**) or (**2-boxer**). In both cases, the fact that she decides cannot surprise her. And thus, .

Now the **1-boxers** out there may argue that **Alice** could still make *randomized* decision-making algorithms, for instance by throwing a coin at some point (or by drawing random bits), to make her decision. Alas, a more general version of the zero-surprise theorem applies.

Theorem (equal surprise).If the likelihood of data is the same in all theories, i.e. for any pair of theories and , and assuming that theories partition the probability space, then the posterior equals the prior, i.e. .

Proof.This also follows from Bayes rule, by expanding the denominator using the law of total probabilities and canceling likelihood terms.

As a result, assuming that the random bits of **Alice**’**s **decision-making algorithm are also unknown to **Omega **(when **Omega** estimated ), given that the probability of these bits will be the same whether **Box A **contains money or not, then **Alice** must conclude , even if she goes for a *randomized* decision. In this case, no matter what decision-making algorithm **Alice** opts for, **Alice **must conclude that the

**2-boxer**strategy yields counterfactually larger expected rewards.

*Counterfactual reasoning based on a transparent decision-making algorithm favors*

**2-boxing**.Theorem (transparent counterfactual optimization).IfAlice’sdecision-making algorithm uses counterfactual optimization, and ifAlice’sepistemic thinking knows that this is how decisions are made, thenAlicewould be a2-boxer.

In fact, it can be shown that a sufficient condition to this is **Alice’s** ability to predict deterministically **Omega’s** prediction of **Alice’s** decision-making algorithm. Indeed, **Alice’s **decision will not change **Alice’s** credence on **Omega’s** prediction , which will thus not change her credence on the contents of **Box A**.

# Opaque decision-making algorithm

Critically, the above conclusion considers that, in a sense, **Alice’s** decision-making algorithm is not fully known to **Alice’s **epistemic system. We got away by assuming that what **Alice** does not know about her algorithm is unknown to **Omega** too. But what if **Omega** had access to bits of **Alice’s** decision-making algorithm that **Alice’s** epistemic system does not quite know? More interestingly still, what if **Alice** knows that **Omega** had a better knowledge of her own decision-making algorithm when **Omega** made its prediction and decided the content of **Box A**?

This may sound outlandish. But it may not be so unreasonable. After all, our own decision-making algorithms are the result of numerous electrical impulses in our brains’ complex neural networks. Clearly, they are opaque to us. In fact, it may be particularly hard for us to predict how our decision-making algorithm will behave, especially in unusual settings — such as Newcomb’s paradox. And even if we’re given an oracle to compute Bayes rule on all our observed data, this oracle will likely still only return a probabilistic distribution over the set of algorithms that we apply to make decisions, since it does not have precise data about the topology of our brains.

Meanwhile, today’s machine learning algorithms already collect huge amounts of data about a lot of different humans. By leveraging data about how some humans behave in unusual settings, it seems possible for them to better infer how our decision-making algorithms will behave in such settings than we can infer it ourselves. After all, if we know what challenges our friends will face, and how people similar to our friends behave in these challenges, we can sometimes correctly bet that our friends will be doing well, even when they are more hesitant.

Well, the opacity of **Alice’s **decision-making algorithm to her epistemic system changes everything. Things now become very interesting! Intuitively, the fact that **Alice** now concludes that she will be a **1-boxer** reveals to her features of her decision-making algorithms, which will thus modify her belief on what **Omega** put in the box. And thus, it is now possible that she actually concludes that . In fact, we will provide an example where this is the case.

But before getting to our example, for the **2-boxers** out there, it’s worth detailing what this means intuitively. Of course, the content of **Box A** will not change because **Alice** chooses to opt for the **1-boxer** strategy. What *does* change, however, is **Alice’s** credence of what **Box A** contains. The world did not move. But **Alice’s** model of the world did. And this change of belief modifies her computation of her counterfactual expected gains.

# More knowledgeable Omega

To present a simple example of , we will assume that **Omega** has access to all of **Alice’s** data, in addition to some additional data that may inform **Omega** on **Alice’s** decision-making system. This will simplify computations as it allows us to apply the Bayesian theorem of the argument of authority.

Theorem (argument of authority).IfAliceandOmegaare honest Bayesians with the same fundamental prior, ifOmegaknows strictly more data thanAlice, and ifOmegatellsAliceabout a prediction it made, thenAlicemust now makeOmega’s prediction. More precisely, for any data and any random variable , we have .

Proof. Alice has a prior on data . The event rules out all data D for which . Thus all data D that still have nonzero probability given must satisfy . Decomposing the computation of using the law of total probability over all possible values of given then yields an average over values that all equal . This average must thus also equal .

An interesting corollary of the theorem is that it is actually now sufficient to express all the uncertainties of the problem in terms of **Omega**’s prediction , rather than in terms of the uncertainty on **Alice’s** decision-making algorithm. In particular, **Alice** can now easily derive her belief on the fact that her decision-making algorithm will make her choose the **1-boxer** strategy.

Lemma 1. Alice’sprior on the fact that she will be a1-boxeris equal to her expected prior onOmega’sprediction. In other words, denoting , we have.

Proof.To obtain this, we first apply the law of total probability to note that . Now, we note that ifOmegaknows strictly more thanAlice, thus ifOmegapredicts thatAlicewill opt for the1-boxerstrategy with probability , thenAlicemust now believe so. By the Bayesian theorem of the argument of authority, . As a result, we have .

Interestingly, we have a similar result on the prior probability that **Box A** contains money.

Lemma 2.Denote the event whereBox Acontains $1M. Then . In other words, fromAlice’sviewpoint, the probability that Box A contains $1M isAlice’sexpected prior onOmega’sprediction.

Proof.. Now, as we saw,Omega’sdecision-making process is such thatOmegaputs $1M in Box A with probability . As a result, we have .

We move on to the computation of the posterior probability that **Box A** contains $1M.

Lemma 3., where is the prior variance ofOmega’sprediction.

Proof.Bayes rule yields . Using as we saw in the two previous paragraphs, we now have .

Now, to compute , we use again the law of total probability, by conditioning over different values of . This yields . Now, the fact that the coin throw ofOmegais independent of the choice of the individual, conditioned on the estimation of ω byOmega,means that , which, as we saw, is equal to by virtue of the Bayesian theorem of the argument of authority. We thus have . For the other quantity , we use Bayes rule , which yields . We then note that, by definition ofOmega’schoice for the content ofBox A, we have . As we already saw, the denominator is . Overall, we obtain .

Using allows to conclude.

In other words, especially if **Alice** has a large uncertainty on **Omega’s** prediction, while assuming that **Omega** is likely predicting that she will be a 2-boxer, **Alice** will increase her credence that **Box A** contains $1M if her decision-making makes her choose **Box A**.

An immediate corollary of this is that, conversely, if the Bayesian opts for the 2-boxer strategy then her credence in Box A containing $1M decreases. Indeed, we have the following Bayesian no-confirmation-bias theorem.

Theorem (no confirmation bias).A Bayesian’s prior must be equal to the expectation of her posterior. She cannot expect to increase her belief after looking at more data. More precisely, .

Proof.This is exactly the law of total probability.

Thus, must be an average of and . But in fact, we can compute the last term explicitly.

Lemma 4..

Proof.By Bayes rule, . But we have , and . Combining it all yields .

Does it mean that we should opt for the **1-boxer** strategy? Well, it depends on the details of the computation. Let us denote . we then have .

Main Theorem.Counterfactual optimization favors1-boxinginequality, if and only if, we have .

Proof.Lemma 3 implies . Meanwhile, Lemma 4 implies . We then have . Therefore, we observe that , if and only if, .

Therefore, the expected gain is larger for the **1-boxer** if the possible large content of **Box A **is much larger than that of **Box B**, if the uncertainty on **Omega’s** prediction is large, and if **Omega’s** prediction is likely to be nearly deterministic. Arguably, this result captures some of the intuition we may have about this problem, as the **1-boxer** strategy may feel more compelling if and if we trust **Omega** to know much more about our own decision-making process than we do.

# Shouldn’t Alice *engage* in counterfactual optimization?

Now, given all of this, assuming , should we conclude that **Alice** must then choose to be a **1-boxer**?

Well, technically, all we said is that if **Alice** follows her decision-making algorithm, if **Omega** knows strictly more than **Alice**, in particular about her decision-making algorithm, and if , then counterfactual reasoning favors **1-boxing**. But weirdly enough, this does not mean that **Alice** should *decide* to be **1-boxing**. Indeed, if **Alice** chooses her decision-making algorithm, then this decision-making algorithm will now be transparent to her, and the theorem no longer applies.

In fact, if **Alice** engages in counterfactual optimization, then she will know her decision-making algorithm, and thus, by virtue of the theorem of transparent counterfactual optimization we proved above, she will be using the **2-boxer** strategy. Put differently, this theorem says that, if you have a *purely Bayesian* epistemic system, then *you can’t engage in counterfactual optimization and decide to be a 1-boxer.*

But say we’re now in the context of the previous section, where **Alice’s **decision-making algorithm is opaque to her. And assume that . Then, *we* can argue that **Alice** should be a **1-boxer**, based on counterfactual optimization. In fact, **Alice’s **own epistemic system may be shouting that she should be **1-boxing**. But this epistemic system is *not* **Alice’s** decision-making system. This decision-making algorithm *cannot *be changed.

But isn’t it just a *wrong* assumption? Can’t we decide that **Alice** *can* decide what decision-making algorithm to execute, perhaps even up to some noise?

Well, if so, then **Alice’s** decision-making algorithm would actually be none other than the meta-algorithm that outputs a decision-making algorithm to be executed, and then execute it. Yet a meta-algorithm is still an algorithm! And it is this meta-algorithm that may be better understood by **Omega** than by **Alice** herself.

This may be frustrating. You might want to ask: Can’t we at least say that the **1-boxer** wins more money than the **2-boxer** strategy?

Yes, definitely, because these are *descriptive* questions. And indeed, if both **Alice** and **Omega** know that **Alice’s** decision-making will tell her to use the **1-boxer** strategy, then **Alice** will know she will gain $1M. Conversely, if both know that **Alice** will be a **2-boxer**, then **Alice** will know she will gain $1,000. In the former case, **Alice** will win more than in the latter case. But, as we saw earlier, because of the zero-surprise theorem, in both cases, . Thus, in both cases, **Alice’s** epistemic system expected to win $1,000 more by taking both boxes. In both cases, her counterfactual optimization urged her to be a **2-boxer**.

# A few remarks for non-Bayesians

Now, unfortunately, those of us who are mere mortals usually don’t have access to a Bayesian oracle. Given our limited computational resources, how should we find inspiration from this Bayesian analysis of Newcomb’s paradox to draw conclusions on what we should actually do in Newcomb-like paradoxes?

The critical feature of our analysis arguably still stands. In a sense, our decision-making algorithms are partial black boxes to ourselves. We are running an algorithm whose code is unknown to us. As a result, we should expect to be surprised by some decisions we are making. And thus to *learn* something once we find out about what our decision-making systems outputs.

Interestingly, if now consider computational complexity, this principle becomes all the more compelling. In practice, because we have limited computing power, especially under the assumption of computational irreducibility*, *we should often be surprised by the result of our decision-making process. This principle is beautifully captured in this quote from Turing’s 1950 brilliant paper.

The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false. A natural consequence of doing so is that one then assumes that there is no virtue in the mere working out of consequences from data and general principles.

To rephrase this, even when you know *how* you decide, you may still not know *what* you decide. You should then not be surprised that some algorithm like **Omega** really can predict *what* you decide better than you can.

In this case, if you somehow thinks that **Omega** does predict better than you can, and if you are really unsure about what **Omega** will predict for you, then approximate counterfactual reasoning may suggest that you might want to engage with **1-boxing**, though this setting with limited computational resources would need to be further investigated to clear things out.

Weirdly, though, this suggests that the more you think about Newcomb’s paradox, the more (probably) you decrease your uncertainty on **Omega’s **prediction — assuming that it’s reasonably close to a Bayesian prediction. And thus, the more compelling the **2-boxer** strategy may seem to you.

# Conclusion

In this article, we carefully distinguished an individual’s decision-making algorithm from its epistemic system. Interestingly, this principle has been argued to be critical to *robust* AI alignment. It is probably useful for us humans as well. The job of deciding what to do is not the same as the job of figuring out a reliable model of the world. Interestingly, we commonly assume that the decision-making algorithm should exploit the epistemic system. But perhaps a neglected reflexion is the fact that the epistemic system must also describe the decision-making algorithm. In practice, it is usually unfortunate that our decision-making algorithms are somewhat opaque to ourselves — though, in the specific case of Newcomb’s paradox, it may be just what we need to counterfactually prefer **1-boxing**.

Note though that I’m leaving here open the question as to whether we should aim at counterfactual optimization as a decision rule. I personally still believe that this is the most natural form of decision rule, especially under a Bayesian framework when considering the von Neumann—Morgenstern preferences. I gladly acknowledge, however, my lack of understanding of alternatives.

An important feature of this article is a strong motivation to consistently apply Bayesianism for epistemic thinking. This motivation I have is the result of years of wonders within the formidable world of probability theory and its astonishing theorems, a few of which have been mentioned above. My fascination for Bayes rule has culminated in a book I recently published at CRC Press, called The Equation of Knowledge, which I presented in this LessWrong post. The book contains a large number of examples, many of which are pretty similar to what you have read in this article. Others are more empirical. If you found this article somewhat interesting, I bet that you’ll really enjoy the book.

(my decision-making algorithm urged me to share truthfully my epistemic thinking about this!)

People argue about the Newcomb’s paradox because they implicitly bury the freedom of choice of both the agent and the predictor in various unstated assumptions. Counterfactual approach is a prime example of it. For a free-will-free treatment of Newcomb’s and other decision theory puzzles, see my old post.

You might be interested in Cheating Death in Damascus, particularly section 3.1, which deals with CDT’s instability, and which points towards a more satisfying version of counterfactual reasoning.