Critiques of FDT Often Stem From Confusion About Newcomblike Problems

Since its inception in 1960, Newcomb’s Problem has continued to generate controversy. Here’s the problem as defined by Yudkowsky & Soares:

An agent finds herself standing in front of a transparent box labeled “A” that contains $1,000, and an opaque box labeled “B” that contains either $1,000,000 or $0. A reliable predictor, who has made similar predictions in the past and been correct 99% of the time, claims to have placed $1,000,000 in box B iff she predicted that the agent would leave box A behind. The predictor has already made her prediction and left. Box B is now empty or full. Should the agent take both boxes (“two-boxing”), or only box B, leaving the transparent box containing $1,000 behind (“one-boxing”)?

Functional Decision Theory (FDT) solves this problem by considering that the predictor is so reliable because she builds an accurate model of the agent. If you want to predict what a a calculator will answer when asked to compute e.g. 34 + 42, it helps a great deal—indeed, it seems necessary—to know what function the calculator implements. If you have an accurate model of this function (addition), then you can just calculate the answer (76) yourself and predict what the calculator will say. Likewise, if we assume the agent’s behavior in Newcomb’s problem is also determined by a function—its decision procedure—then, if the predictor can model this function, it can accurately predict what the agent will do. An FDT agent’s decision procedure asks: “What output of this very decision procedure results in the best outcome?” Knowing that this very decision procedure is implemented by both the agent and the predictor (when she models the agent), and knowing the output is necessarily the same on both occasions (like 34 + 42 equals 76 regardless of who or what is doing the computation), the answer can only be to one-box.

If two systems are computing the same function, Yudkowsky and Soares state these systems are subjunctively dependent upon that function. If you predict a calculator will answer 76 when prompted with 34 + 42, you and the calculator are subjunctively dependent upon the addition function. Likewise, the agent and the predictor are subjunctively dependent upon the agent’s decision procedure in Newcomb’s Problem.

Given the assumption of subjunctive dependence between the agent and the predictor, the answer to Newcomb’s Problem must be one-boxing. And yet, FDT has received quite some critique, a large part of which centers around what MacAskill has called “implausible recommendations” in Newcomblike problems. Consider MacAskill’s Bomb:

You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.

A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.

The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.

You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?

MacAskill comments:

The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left.

MacAskill calls this recommendation “implausbile enough”. But while he is right that FDT Left-boxes, he’s wrong to say it does so “in the full knowledge that as a result you will slowly burn to death”. To see why, let’s first consider the following thought experiment.

Identical Rooms. Imagine you wake up in a white room. Omega sits next to your bed, and says that 1 hour ago, he flipped a fair coin. If the coin came up heads, he put you to sleep in a white room; if the coin came up tails, he put you to sleep in another, identical white room. You are now in one of these two rooms. Omega gave you a pill that made you forget the events just before you went to sleep, and because the rooms are identical, you have no way of knowing whether the coin came up heads or tails. In front of you appears a special box, and you can choose to take it or to leave it. Omega tells you the content of the box depends on whether the coin came up heads or tails. If it was tails, the box contains a fine for $100, which you’ll have to pay Omega in case you take the box. If the coin came up heads, the box contains $10,000, which you get to keep should you take the box.

Question A) In order to make as much (expected) money as possible, should you take the box?

Question B) If the coin came up tails, should you take the box?

Question A is straightforward: there’s a ⁵⁰⁄₅₀ probability of winning $10,000 and losing $100, which comes out to an expected value of $4,950. Yes, you should take the box.

What about Question B? I hope it’s obvious this one doesn’t make any sense: it’s asking the wrong question. You have no way of knowing whether the coin came up heads or tails!

So why bring up this rather silly thought experiment? Because I believe MacAskill is essentially making the “Question B mistake” when he says

The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death.

Remember: in Bomb, the predictor runs a simulation of you in order to make her prediction. This, of course, makes you and the predictor subjunctively dependent upon your decision procedure, and more to the point, you can’t know whether you are the real “you” or the simulated version of you. If you did, that would influence your decision procedure, breaking the subjunctive dependence. The simulated “you” observes the same things (or at least the same relevant things) as you and therefore can’t tell she’s simulated. And if the simulated “you” Left-boxes, this doesn’t lead to you burning to death: it leads to an empty Left box and you not losing $100.

In fact, it’s not so much that you can’t know whether you are the real “you” or the simulated “you”—you are both of them, at different times, and you have to make a decision taking this into account. Left-boxing simply leads to not burning to death AND not losing $100! (Yeah, unless the predictor made a mistake—but that probability is ~0.) In the Identical Rooms thought experiment, you are in only one of the two rooms (each with probability ¹⁄₂), but the point remains.

Identical Rooms was modelled after Counterfactual Mugging:

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it’d give you $10000, but only if you’d agree to give it $100 if the coin came up tails.
Omega can predict your decision in case it asked you to give it $100, even if that hasn’t actually happened, it can compute the counterfactual truth. Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would’ve given you $10000.

If we assume Omega predicts your decision by simulating you and modelling your decision procedure (and thus assume subjunctive dependence), then Counterfactual Mugging is isomorphic to Identical Rooms. Though it may seem like you know the coin came up tails in Counterfactual Mugging, there’s only a ¹⁄₂ probability this is actually the case: because if the coin came up heads, Omega simulates you and a simulated Omega tells you the exact same thing as the real Omega does given tails. So “What do you decide, given that the coin came up tails?” is the wrong question: you don’t actually know the outcome of the coin flip!

In the original Newcomb’s Problem, you have to decide whether to one-box or to two-box knowing that you make this exact decision on two occasions: in the predictor’s simulation of you and in the “real world”. Given this, the correct answer has to be one-boxing. So although some seem to believe it’s good to be the kind of person to one-box, but you should two-box when you are actually deciding, FDT denies there’s a difference: one-boxers one-box, and those who one-box are one-boxers.