# Defending Functional Decision Theory

As I have been studying Functional Decision Theory (FDT) a lot recently, I have come across quite some counterarguments and general remarks that are worth rebutting and/​or discussing in more detail. This post is an attempt to do just that. Most points have been discussed in other posts, but as my understanding of FDT has grown, I decided to write this new post. For readers unfamiliar with FDT, I recommend reading Functional Decision Theory: A New Theory of Instrumental Rationality.

# The Bomb Argument

Originally proposed by William MacAskill:

You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay \$100 in order to be able to take it.

A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.

The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.

You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?

The argument against FDT, then, is that it recommends Left-boxing, which supposedly is wrong because it makes you slowly burn to death while you could have just paid \$100 instead.

## Analysis and Rebuttal

On Bomb, FDT indeed recommends Left-boxing. As the predictor seems to have a model of your decision procedure which she uses to make her prediction, FDT reasons that whatever you decide now, the predictor’s model of you also decided. If you Left-box, so did the model; if you Right-box, so did the model. If the model Left-boxed, then the predictor would have predicted you Left-box, and, crucially, not put a bomb in Left. If the model instead Right-boxed, there would be a bomb in Left. Reasoning this way, Left-boxing gives you a situation with no bomb (with probability a trillion trillion minus 1 out of a trillion trillion) where you don’t pay any money, while Right-boxing gets you one where you pay \$100. Left-boxing then clearly wins, assuming you don’t value your life higher than \$100 trillion trillion. Let’s assume you value your life at \$1,000,000.

### “But there is a bomb in Left! You burn to death!”

Well, the problem indeed specifies there is a bomb in Left, but this is as irrelevant as saying “But you’re in town already!” in Parfit’s Hitchhiker (note that this version of Parfit’s Hitchhiker asks whether you should pay once you’re already in town). There, you could say paying is irrational since you’re in town already and paying just loses you money. But if you are a non-paying agent talking to the driver, he will know you are a non-paying agent (by design of the problem), and never take you to town to begin with. Similarly, if you are a Left-boxer, the predictor in Bomb will not put a bomb in Left and you can save yourself \$100. Really: Left-boxing in Bomb is analogous to and just as rational as paying in Parfit’s Hitchhiker.

### “The predictor isn’t perfect. There can be a bomb in Left while you Left-box.”

So we’re focusing on that 1 in a trillion trillion case where the predictor is wrong? Great. FDT saves \$100 in 99,999,999,999,999,999,999,999 out of a trillion trillion cases and burns to death in 1 of them. FDT wins, period.

### “But the scenario focuses on that 1 in a trillion trillion case. It doesn’t mention the other 99,999,999,999,999,999,999,999 cases.”

No, it doesn’t just focus on that 1 in a trillion trillion case. It mentions the predictor, who predicts your decision with great accuracy, and then asks what decision you should make. That decision influences the prediction via subjunctive dependence. You can’t propose an extremely accurate predictor-of-your-decision and then expect me to reason as if that predictor’s prediction and my decision are independent of each other. Yes, the prediction can be wrong, but it can be—and almost certainly is—right too. It’s simply wrong to reason about a fixed prediction.

### “Look, if you had to choose before you know what’s in the boxes, Left-boxing might make sense. But that’s not the case!”

Yes, that’s exactly the case, due to subjunctive dependence between you and the predictor. The predictor runs a model of your decision procedure. Whatever you decide, that model also “decided”, before the predictor fixes the content of the boxes.

Bomb gives us 1 in a trillion trillion cases where FDT agents die horribly, and almost a trillion trillion cases where they save \$100. Bomb is an argument for FDT, not against it.

# The Procreation Argument

Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there’s a significant probability that I wouldn’t exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)

In Procreation, FDT agents have a much worse life than CDT agents.

## Analysis and Rebuttal

FDT agents indeed have a worse life than CDT agents in Procreation, but that has nothing to do with rationality and everything with the problem structure. An FDT agent would procreate, since that gives a high probability (let’s say 0.99) that she exists miserably, which she prefers to not existing at all. If life without children is valued at \$1,000,000 and life with children at \$100,000, than her expected utility for procreation is \$99,000. For not procreating, it is \$10,000. FDT therefore procreates. If you’re a CDT agent, it is assumed the father procreated; the expected utility for procreating, then, is \$100,000; for not procreating, it is \$1,000,000. CDT doesn’t procreate, and makes \$990,000 more than FDT. But I hope the reader agrees that we’re not really discussing one problem here; we’re discussing two different problems, one for FDT and one for CDT. For each theory, there are very different probabilities on the table! Critiquing FDT with Procreation is like critiquing CDT because EDT gets more money in Newcomb’s Problem than CDT does in Parfit’s Hitchhiker. FDT agents choose the best option available to them in Procreation!

Note that we can just as easily create a version of Procreation where CDT agents “have a much worse life” than FDT agents. Simply have the father be a CDT agent! In that case, FDT agents don’t procreate and have a happy life—and, notably, CDT agents, not using the subjunctive dependence between them and their father, still don’t procreate, and almost certainly cease to exist.

## A More Fair Version of Procreation

Any problem designed to compare two decision theories should at least give the same payoffs and probabilities for each decision theory. Therefore, here’s a more fair version of Procreation:

Procreation*. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed my very decision procedure. If I were to not procreate, there’s a significant probability that I wouldn’t exist. I highly value existing (even miserably existing).

Procreation* gives both FDT and CDT agents (and indeed, all agents) the same dilemma. FDT agents procreate and live miserably; CDT agents don’t procreate and almost certainly don’t exist. FDT beats CDT in this dilemma.

# The Tweak the Utility Function Argument

Alright, this one is not targeted at FDT per se, but it’s still important to discuss as it might hinder further development of FDT. In On Functional Decision Theory, Wolfgang Schwarz argues that where CDT makes the less-than-optimal decision, the trick is not to develop a new decision theory, but to tweak the utility function. I want to emphasize just how much this does not fix the problem. If your game AI doesn’t play chess very well, the right thing to do is to improve your algorithm, not to define the opening position of chess as a winning position for your AI.

For example, Schwarz argues that on the Psychological Twin Prisoner’s Dilemma, the agent should care about her twin’s prison years as well. If the agent cares about her and her twin’s prison years equally, then, based on these prison years, the payoff matrix becomes something like this:

Now cooperating is easily the best choice for CDT. Schwarz notes that if he “were to build an agent with the goal that they do well for themselves, I’d give them this kind of utility function, rather than implement FDT.” Of course you’d give them an altruistic utility function! However, CDT still doesn’t solve the Psychological Twin Prisoner’s Dilemma. It only fixes the version with the modified utilities, which is completely different (e.g. it has a different Nash Equilibrium). You may argue that a CDT agent with an altruistic utility function wouldn’t ever come across the original version of the problem—but the very fact that it can’t solve that relatively easy problem points at a serious flaw in its decision theory (CDT). It also suggests this isn’t the only problem CDT doesn’t solve correctly. This is indeed the case, and Schwarz goes on to make an ad hoc adjustment for CDT to solve Blackmail:

Blackmail. Donald has committed an indiscretion. Stormy has found out and considers blackmailing Donald. If Donald refuses and blows Stormy’s gaff, she is revealed as a blackmailer and his indiscretion becomes public; both suffer. It is better for Donald to pay hush money to Stormy. Knowing this, it is in Stormy’s interest to blackmail Donald. If Donald were irrational, he would blow Stormy’s gaff even though that would hurt him more than paying the hush money; knowing this, Stormy would not blackmail Donald. So Donald would be better off if here were (known to be) irrational.

Here, Schwarz suggest Donald should have a “strong sense of pride” or a “vengeful streak” in order to avoid being blackmailed. (Note that an altruistic player wouldn’t prefer not being blackmailed over paying Stormy.) The point is this: if your decision theory requires ad hoc fixes in the utility function, it’s not a good decision theory.

Schwarz:

FDT agents rarely find themselves in Blackmail scenarios. Neither do CDT agents with a vengeful streak. If I wanted to design a successful agent for a world like ours, I would build a CDT agent who cares what happens to others.

Well, and have a vengeful streak, or pride, apparently. Altruism doesn’t solve it all, it seems.

My CDT agent would still two-box in Newcomb’s Problem with Transparent Boxes (or in the original Newcomb Problem). But this kind of situation practically never arises in worlds like ours.

If your decision theory can’t solve Newcomb’s Problem, that’s probably a sign there are more problems it can’t solve. Indeed, Newcomblike problems are the norm.

# Argument Against Subjective Dependence

To see this, consider two calculators. The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not? Well, perhaps on this foreign calculator the ‘–’ symbol means what we usually take it to mean — namely, that the ensuing number is negative — and therefore every time we hit the ‘=’ button on the second calculator we are asking it to run the algorithm ‘compute the sum entered, then output the negative of the answer’. If so, then the calculators are systematically running different algorithms.

But perhaps, in this foreign land, the ‘–’ symbol, in this context, means that the ensuing number is positive and the lack of a ‘–’ symbol means that the number is negative. If so, then the calculators are running exactly the same algorithms; their differences are merely notational.

Ultimately, in my view, all we have, in these two calculators, are just two physical processes. The further question of whether they are running the same algorithm or not depends on how we interpret the physical outputs of the calculator. There is no deeper fact about whether they’re ‘really’ running the same algorithm or not. And in general, it seems to me, there’s no fact of the matter about which algorithm a physical process is implementing in the absence of a particular interpretation of the inputs and outputs of that physical process.

But if that’s true, then, even in the Newcomb cases where a Predictor is simulating you, it’s a matter of choice of symbol-interpretation whether the predictor ran the same algorithm that you are now running (or a representation of that same algorithm). And the way you choose that symbol-interpretation is fundamentally arbitrary. So there’s no real fact of the matter about whether the predictor is running the same algorithm as you. It’s indeterminate how you should act, given FDT: you should one-box, given one way of interpreting the inputs and outputs of the physical process the Predictor is running, but two-box given an alternative interpretation.

## Analysis and Rebuttal

The first thing to say here is that FDT’s subjunctive dependence is about functions, not algorithms: for example, counting sort and Quicksort are both sorting algorithms for the same function. However, the argument works the same if we replace “algorithm” for “function.” But perhaps most importantly, the properties of a calculator (or anything, really) can’t depend on how we interpret its output, because different people can interpret it differently. Therefore, the calculators in the example are implementing different functions: one of them maps “2 + 2” to “4“, the other maps “2 + 2” to “-4”. However, it does seem the second one uses the function of the first one as “subfunction”: it needs to know the “real” answer to “2 + 2” in order to output “-4″. Therefore, the calculators are subjunctively dependent on that subfunction, even though their outputs are different. Even if the second calculator always outputs “[output of first calculator] + 1”, the calculators are still subjunctively dependent on that same function.

In Newcomb’s Problem, the idea seems to be that the predictor uses a model of your decision procedure that does use the same outputs as you, in which case the predictor is computing the same function as the agent. But, like with the calculators, even if the outputs are phrased differently, subjunctive dependence can still exist. It is of course up to the predictor how she interprets the outputs of the model, but there is a clearly “right” way to interpret them given that there is (full) subjunctive dependence going on between the agent and the predictor.

# The Agent-y Argument

Also in A Critique of Functional Decision Theory, MacAskill makes an argument that hinges on how “agent-y” a process is:

First, take some physical processes S (like the lesion from the Smoking Lesion) that causes a ‘mere statistical regularity’ (it’s not a Predictor). And suppose that the existence of S tends to cause both (i) one-boxing tendencies and (ii) whether there’s money in the opaque box or not when decision-makers face Newcomb problems. If it’s S alone that results in the Newcomb set-up, then FDT will recommending two-boxing.

But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing. But this seems arbitrary — why should the fact that S’s causal influence on whether there’s money in the opaque box or not go via another agent much such a big difference? And we can think of all sorts of spectrum cases in between the ‘mere statistical regularity’ and the full-blooded Predictor: What if the ‘predictor’ is a very unsophisticated agent that doesn’t even understand the implications of what they’re doing? What if they only partially understand the implications of what they’re doing? For FDT, there will be some point of sophistication at which the agent moves from simply being a conduit for a causal process to instantiating the right sort of algorithm, and suddenly FDT will switch from recommending two-boxing to recommending one-boxing.

Second, consider that same physical process S, and consider a sequence of Newcomb cases, each of which gradually make S more and more complicated and agent-y, making it progressively more similar to a Predictor making predictions. At some point, on FDT, there will be a point at which there’s a sharp jump; prior to that point in the sequence, FDT would recommend that the decision-maker two-boxes; after that point, FDT would recommend that the decision-maker one-boxes. But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.

## Analysis and Rebuttal

The crucial error here is that whether “there’s an agent making predictions” is not the relevant factor for FDT. What matters is subjunctive dependence: two physical systems computing the same function. This definition doesn’t care about any of these systems being agents. So:

But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.

No. The problem remains the same as far as FDT is concerned (although maybe some uncertainty is added with the agent). There is no subjunctive dependence in this case, and adding the agent like this doesn’t help as it isn’t computing the same function as the main agent in the problem.

The rebuttal of MacAskill’s second example about S become gradually more “agent-y” is mostly the same: agent-ness doesn’t matter. However:

But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.

Why? I mean, there’s no sharp jump anyway (because there’s no subjunctive dependence), but in general, a tiny change in physical makeup can make a difference. For example, in Newcomb’s Problem, if the accuracy of the predictor drops below a threshold, two-boxing “suddenly” becomes the better choice. I can imagine a tiny change in physical makeup causing the predictor to predict just a little less accurately, dropping the accuracy from just above to just below the threshold.

# Final Notes

In conclusion, none of the above arguments successfully undermine FDT. So far, it seems FDT does everything right that CDT does right while also doing everything right EDT does right, and all of that using a very plausible concept. Subjunctive dependence is a real thing: you know one calculator will output “5040” on “7!” if you just gave “7!” to another identical calculator. FDT needs to be developed further, but it certainly withstands the criticisms.

• What, precisely, is meant here by:

The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.

Is this a prediction that I would take Right, given that the predictor said that I would take Right?

Or is note indicating that I would take Right in the absence of a note?

This is an important distinction I believe.

• However, it does seem the second one uses the function of the first one as “subfunction”: it needs to know the “real” answer to “2 + 2” in order to output “-4″. Therefore, the calculators are subjunctively dependent on that subfunction, even though their outputs are different. Even if the second calculator always outputs “[output of first calculator] + 1”, the calculators are still subjunctively dependent on that same function.

Why not reverse the situation? Couldn’t you just as well say that the calculator that outputs 4 is subjunctively dependent on the calculator that outputs −4, since it needs to know that the real answer to the second is −4 in order to drop the—and output 4?

• Direction of causality, or even causality itself, is irrelevant to FDT. Subjunctive dependence is simply a statement that two variables are not independent across the possible worlds conditional on parameters of interest. It doesn’t say that one causes the other or that they have a common cause.

In the calculator example, the variables are the outputs of the two calculators, and the parameters of interest are inputs common to both calculators. In this case the dependence is extremely strong: there is a 1:1 relation between the outputs for any given input in all possible worlds where both calculators are functioning correctly.

For the purposes of FDT, the relevant subjunctive dependence is that between the decision process outputs and the outcomes, and the variables of interest are the inputs to the decision process. In carefully constructed scenarios such as Newcombe’s problem, the subjunctive dependence is total: Omega is a perfect predictor. When the dependence is weaker, the details matter more—but still causality is irrelevant.

In the case of weaker dependence you can get something like a direction of dependence, in that perhaps each value of variable A corresponds to a single value of variable B across possible worlds, but not vice versa. This still doesn’t indicate causality.

• What I have in mind is stuff like this:

FDT can require that P come augmented with information about the logical, mathematical, computational, causal, etc. structure of the world more broadly. Given a graph G that tells us how changing a logical variable affects all other variables, we can re-use Pearl’s do operator to give a decision procedure for FDT

FDT seems to rely heavily on this sort of assumption, but also seems to lack any sort of formalization of how the logical graphs work.

• Interesting point. It seems to me that given MacAskill’s original setup of the calculators, the second one really does calculate the first one’s function and adds the -. Like, if 2 + 2 where to equal 5 tomorrow, the first calculator would output 5 and the second one −5.

• Idk . MacAskill’s setup is kinda messy because it involves culture and physics and computation too, these layers introduce all sorts of complexity that makes it hard to analyze. Whereas you seem to say that causality is meaningful for logic and for mathematical functions too.

So let’s stay within math. Suppose for instance we represent functions in the common way, with f being represented as it’s graph { (x, y) where y = f(x) }. Under what conditions does one such set cause another?

• Procreation* gives both FDT and CDT agents (and indeed, all agents) the same dilemma. FDT agents procreate and live miserably; CDT agents don’t procreate and almost certainly don’t exist. FDT beats CDT in this dilemma.

This doesn’t seem right: you already exist! In order to say that “FDT beats CDT” I think you have to argue that one should care about the number of branches you exist in—which is what you plausibly have uncertainty about, not about whether this very instance of you exists. (And this is arguably just about preferences, as Christiano writes about here. So it is unclear what it would even mean to say that “FDT beats CDT”.) That is, this is about implementing a specific version of mixed-upside updatelessness or not—specifically, the multiverse version of MUU I describe here.

• Thanks for your reaction!

This doesn’t seem right: you already exist!

Sure, I already exist; together with the fact that I make the exact same decision my father made, that implies I procreate and therefore I’m not a CDT’er.

The point with these problems is, I believe, that your decision procedure is implemented at least 1 time, but possibly 2 times throughout time—depending on what your decision procedure outputs.

In Procreation*, if “my” decision procedure outputs “procreate”, it first does so “in” my father, who then procreates, causing me to exist. I then also procreate.

But if “my” decision procedure outputs “don’t procreate”, it also first does so “in” my father, who then doesn’t procreate, and then I don’t exist.

The question “Should I procreate?” is a bit misleading, then, as I possibly don’t exist.

Or, we indeed assume I do exist; but then it’s not much of a decision problem anymore. If I exist, then my father procreated, and I necessarily procreate too.

• (Warning: this is a bit of a sidenote. There are very likely other related problems that do not suffer this issue. I might suggest that the argument and chain of logic in this post would be stronger if you chose another variant.)

If you’re the last person in the universe, knowing that you’ll never see anyone else again ever, why does \$100 have any value to you?

Response: just decrease the failure rate arbitrarily far until it does balance again.

Counter-response A: the failure rate cannot be reduced arbitrarily far, because P(I misinterpreted what the agent is saying) is positive & non-zero.

Counter-response B: if the value of the money to me is negative—which given I’m hauling around a piece of paper when I’m literally the last person in the universe it very well may be—there is no such failure rate.

• Yeah, the \$100 wouldn’t have value, but we can assume for the problem at hand that Right-boxing comes with a cost that, expressed in dollars, equals 100 - just like I expressed the value of living in dollars, at \$1,000,000.