Did EDT get it right all along? Introducing yet another medical Newcomb problem

One of the main arguments given against Evidential Decision Theory (EDT) is that it would “one-box” in medical Newcomb problems. Whether this is the winning action has been a hotly debated issue on LessWrong. A majority, including experts in the area such as Eliezer Yudkowsky and Wei Dai, seem to think that one should two-box (See e.g. Yudkowsky 2010, p.67). Others have tried to argue in favor of EDT by claiming that the winning action would be to one-box, or by offering reasons why EDT would in some cases two-box after all. In this blog post, I want to argue that EDT gets it right: one-boxing is the correct action in medical Newcomb problems. I introduce a new thought experiment, the Coin Flip Creation problem, in which I believe the winning move is to one-box. This new problem is structurally similar to other medical Newcomb problems such as the Smoking Lesion, though it might elicit the intuition to one-box even in people who would two-box in some of the other problems. I discuss both how EDT and other decision theories would reason in the problem and why people’s intuitions might diverge in different formulations of medical Newcomb problems.

Two kinds of Newcomblike problems

There are two different kinds of Newcomblike problems. In Newcomb’s original paradox, both EDT and Logical Decision Theories (LDT), such as Timeless Decision Theory (TDT) would one-box and therefore, unlike CDT, win $1 million. In medical Newcomb problems, EDT’s and LDT’s decisions diverge. This is because in the latter, a (physical) causal node that isn’t itself a decision algorithm influences both the current world state and our decisions – resulting in a correlation between action and environment but, unlike the original Newcomb, no “logical” causation.

It’s often unclear exactly how a causal node can exert influence on our decisions. Does it change our decision theory, utility function, or the information available to us? In the case of the Smoking Lesion problem, it seems plausible that it’s our utility function that is being influenced. But then it seems that as soon as we observe our utility function (“notice a tickle”; see Eells 1982), we lose “evidential power” (Almond 2010a, p.39), i.e. there’s nothing new to learn about our health by acting a certain way if we already know our utility function. In any case, as long as we don’t know and therefore still have the evidential power, I believe we should use it.

The Coin Flip Creation Problem is an adaption of Caspar Oesterheld’s “Two-Boxing Gene” problem and, like the the latter, attempts to take Newcomb’s original problem and make it into a medical Newcomb problem, triggering the intuition that we should one-box. In Oesterheld’s Two-Boxing Gene, it’s stated that a certain gene correlates with our decision to one-box or two-box in Newcomb’s problem, and that Omega, instead of simulating our decision algorithm, just looks at this gene.

Unfortunately, it’s not specified how the correlation between two-boxing and the gene arises, casting doubt on whether it’s a medical Newcomb problem at all, and whether other decision algorithms would disagree with one-boxing. Wei Dai argues that in the Two-Boxing Gene, if Omega conducts a study to find out which genes correlate with which decision algorithm, then Updateless Decision Theory (UDT) could just commit to one-boxing and thereby determine that all the genes UDT agents have will always correlate with one-boxing. So in some sense, UDT’s genes will still indirectly constitute a “simulation” of UDT’s algorithm, and there is a logical influence between the decision to one-box and Omega’s decision to put $1 million in box A. Similar considerations could apply for other LDTs.

The Coin Flip Creation problem is intended as an example of a problem in which EDT would give the right answer, but all causal and logical decision theories would fail. It works explicitly through a causal influence on the decision theory itself, thus reducing ambivalence about the origin of the correlation.

The Coin Flip Creation problem

One day, while pondering the merits and demerits of different acausal decision theories, you’re visited by Omega, a being assumed to possess flawless powers of prediction and absolute trustworthiness. You’re presented with Newcomb’s paradox, but with one additional caveat: Omega informs you that you weren’t born like a normal human being, but were instead created by Omega. On the day you were born, Omega flipped a coin: If it came up heads, Omega created you in such a way that you would one-box when presented with the Coin Flip Creation problem, and it put $1 million in box A. If the coin came up tails, you were created such that you’d two-box, and Omega didn’t put any money in box A. We don’t know how Omega made sure what your decision would be. For all we know, it may have inserted either CDT or EDT into your source code, or even just added one hard-coded decision rule on top of your messy human brain. Do you choose both boxes, or only box A?

It seems like EDT gets it right: one-boxing is the winning action here. There’s a correlation between our decision to one-box, the coin flip, and Omega’s decision to put money in box A. Conditional on us one-boxing, the probability that there is money in box A increases, and we “receive the good news” – that is, we discover that the coin must have come up heads, and we thus get the million dollars. In fact, we can be absolutely certain of the better outcome if we one-box. However, the problem persists if the correlation between our actions and the content of box A isn’t perfect. As long as the correlation is high enough, it is better to one-box.

Nevertheless, neither causal nor logical counterfactuals seem to imply that we can determine whether there is money in box A. The coin flip isn’t a decision algorithm itself, so we can’t determine its outcome. The logical uncertainty about our own decision output doesn’t seem to coincide with the empirical uncertainty about the outcome of the coin flip. In absence of a causal or logical link between their decision and the content of box A, CDT and TDT would two-box.

Updateless Decision Theory

As far as I understand, UDT would come to a similar conclusion. AlephNeil writes in a post about UDT:

In the Smoking Lesion problem, the presence of a ‘lesion’ is somehow supposed to cause Player’s to choose to smoke (without altering their utility function), which can only mean that in some sense the Player’s source code is ‘partially written’ before the Player can exercise any control over it. However, UDT wants to ‘wipe the slate clean’ and delete whatever half-written nonsense is there before deciding what code to write.

Ultimately this means that when UDT encounters the Smoking Lesion, it simply throws away the supposed correlation between the lesion and the decision and acts as though that were never a part of the problem.

This approach seems wrong to me. If we use an algorithm that changes our own source code, then this change, too, has been physically determined and can therefore correlate with events that aren’t copies of our own decision algorithm. If UDT reasons as though it could just rewrite its own source code and discard the correlation with the coin flip altogether, then UDT two-boxes and thus by definition ends up in the world where there is no money in box A.

Note that updatelessness seemingly makes no difference in this problem, since it involves no a priori decision: Before the coin flip, there’s a 50% chance of becoming either a one-boxing or a two-boxing agent. In any case, we can’t do anything about the coin flip, and therefore also can’t influence whether box A contains any money.

I am uncertain how UDT works, though, and would be curious about others people’s thoughts. Maybe UDT reasons that by one-boxing, it becomes a decision theory of the sort that would never be installed into an agent in a tails world, thus rendering impossible all hypothetical tails worlds with UDT agents in them. But if so, why wouldn’t UDT “one-box” in the Smoking Lesion? As far as the thought experiments are specified, the causal connection between coin flip and two-boxing in the Coin Flip Creation appears to be no different from the connection between gene and smoking in the Smoking Lesion.

More adaptations and different formalizations of LDTs exist, e.g. Proof-Based Decision Theory. I could very well imagine that some of those might one-box in the thought experiment I presented. If so, then I’m once again curious as to where the benefits of such decision theories lie in comparison to plain EDT (aside from updatelessness – see Concluding thoughts).

Coin Flip Creation, Version 2

Let’s assume UDT would two-box in the Coin Flip Creation. We could alter our thought experiment a bit so that UDT would probably one-box after all:

The situation is identical to the Coin Flip Creation, with one key difference: After Omega flips the coin and creates you with the altered decision algorithm, it actually simulates your decision, just as in Newcomb’s original paradox. Only after Omega has determined your decision via simulation does it decide whether to put money in box A, conditional on your decision. Do you choose both boxes, or only box A?

Here is a causal graph for the first and second version of the Coin Flip Creation problem. In the first version, a coin flip determines whether there is money in box A. In the second one, a simulation of your decision algorithm decides:

Since in Version 2, there’s a simulation involved, UDT would probably one-box. I find this to be a curious conclusion. The situation remains exactly the same – we can rule out any changes in the correlation between our decision and our payoff. It seems confusing to me, then, that the optimal decision should be a different one.

Copy-altruism and multi-worlds

The Coin Flip Creation problem assumes a single world and an egoistic agent. In the following, I want to include a short discussion of how the Coin Flip Creation would play out in a multi-world environment.

Suppose Omega’s coin is based on a quantum number generator and produces 50% heads worlds and 50% tails worlds. If we’re copy-egoists, EDT still recommends to one-box, since doing so would reveal to us that we’re in one of the branches in which the coin came up heads. If we’re copy-altruists, then in practice, we’d probably care a bit less about copies whose decision algorithms have been tampered with, since they would make less effective use of the resources they gain than we ourselves would (i.e. their decision algorithm sometimes behaves differently). But in theory, if we care about all the copies equally, we should be indifferent with respect to one-boxing or two-boxing, since there will always be 50% of us in either of the worlds no matter what we do. The two groups always take the opposite action. The only thing we can change is whether our own copy belongs to the tails or the heads group.

To summarize, UDT and EDT would both be indifferent in the altruistic multi-world case, but UDT would (presumably) two-box, and EDT would one-box, in both the copy-egoistic multi-worlds and in the single-world case.

“But I don’t have a choice”

There seems to be an especially strong intuition of “absence of free will” inherent to the Coin Flip Creation problem. When presented with the problem, many respond that if someone had created their source code, they didn’t have any choice to begin with. But that’s the exact situation in which we all find ourselves at all times! Our decision architecture and choices are determined by physics, just like a hypothetical AI’s source code, and all of our choices will thus be determined by our “creator.” When we’re confronted with the two boxes, we know that our decisions are predetermined, just like every word of this blogpost has been predetermined. But that knowledge alone won’t help us make any decision. As far as I’m aware, even an agent with complete knowledge of its own source code would have to treat its own decision outputs as uncertain, or it would fail to implement a decision algorithm that takes counterfactuals into account.

Note that our decision in the Coin Flip Creation is also no less determined than in Newcomb’s paradox. In both cases, the prediction has been made, and physics will guide our thoughts and our decision in a deterministic and predictable manner. Nevertheless, we can still assume that we have a choice until we make our decision, at which point we merely “find out” what has been our destiny all along.

Concluding thoughts

I hope that the Coin Flip Creation motivates some people to reconsider EDT’s answers in Newcomblike problems. A thought experiment somewhat similar to the Coin Flip Creation can be found in Arif Ahmed 2014.

Of course, the particular setup of the Coin Flip Creation means it isn’t directly relevant to the question of which decision theory we should program into an AI. We obviously wouldn’t flip a coin before creating an AI. Also, the situation doesn’t really look like a decision problem from the outside; an impartial observer would just see Omega forcing you to pick either A or B. Still, the example demonstrates that from the inside view, evidence from the actions we take can help us achieve our goals better. Why shouldn’t we use this information? And if evidential knowledge can help us, why shouldn’t we allow a future AI to take it into account? In any case, I’m not overly confident in my analysis and would be glad to have any mistakes pointed out to me.

Medical Newcomb is also not the only class of problems that challenge EDT. Evidential blackmail is an example of a different problem, wherein giving the agent access to specific compromising information is used to extract money from EDT agents. The problem attacks EDT from a different angle, though: namely by exploiting it’s lack of updatelessness, similar to the challenges in Transparent Newcomb, Parfit’s Hitchhiker, Counterfactual Mugging, and the Absent-Minded Driver. I plan to address questions related to updatelessness, e.g. whether it makes sense to give in to evidential blackmail if you already have access to the information and haven’t precommitted not to give in, at a later point.


I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk.