A Critique of Functional Decision Theory
A Critique of Functional Decision Theory
NB: My writing this note was prompted by Carl Shulman, who suggested we could try a low-time-commitment way of attempting to understanding the disagreement between some folks in the rationality community and academic decision theorists (including myself, though I’m not much of a decision theorist). Apologies that it’s sloppier than I’d usually aim for in a philosophy paper, and lacking in appropriate references. And, even though the paper is pretty negative about FDT, I want to emphasise that my writing this should be taken as a sign of respect for those involved in developing FDT. I’ll also caveat I’m unlikely to have time to engage in the comments; I thought it was better to get this out there all the same rather than delay publication further.
There’s a long-running issue where many in the rationality community take functional decision theory (and its variants) very seriously, but the academic decision theory community does not. But there’s been little public discussion of FDT from academic decision theorists (one exception is here); this note attempts to partly address this gap.
So that there’s a clear object of discussion, I’m going to focus on Yudkowsky and Soares’ ‘Functional Decision Theory’ (which I’ll refer to as Y&S), though I also read a revised version of Soares and Levinstein’s Cheating Death in Damascus.
This note is structured as follows. Section II describes causal decision theory (CDT), evidential decision theory (EDT) and functional decision theory (FDT). Sections III-VI describe problems for FDT: (i) that it sometimes makes bizarre recommendations, recommending an option that is certainly lower-utility than another option; (ii) that it fails to one-box in most instances of Newcomb’s problem, even though the correctness of one-boxing is supposed to be one of the guiding motivations for the theory; (iii) that it results in implausible discontinuities, where what is rational to do can depend on arbitrarily small changes to the world; and (iv) that, because there’s no real fact of the matter about whether a particular physical process implements a particular algorithm, it’s deeply indeterminate what FDT’s implications are. In section VII I discuss the idea that FDT ‘does better at getting utility’ than EDT or CDT; I argue that Y&S’s claims to this effect are unhelpfully vague, and on any more precise way of understanding their claim, aren’t plausible. In section VIII I briefly describe a view that captures some of the motivation behind FDT, and in my view is more plausible. I conclude that FDT faces a number of deep problems and little to say in its favour.
In what follows, I’m going to assume a reasonable amount of familiarity with the debate around Newcomb’s problem.
II. CDT, EDT and FDT
Informally: CDT, EDT and FDT differ in what non-causal correlations they care about when evaluating a decision. For CDT, what you cause to happen is all that matters; if your action correlates with some good outcome, that’s nice to know, but it’s not relevant to what you ought to do. For EDT, all correlations matter: you should pick whatever action will result in you believing you will have the highest expected utility. For FDT, only some non-causal correlations matter, namely only those correlations between your action and events elsewhere in time and space that would be different in the (logically impossible) worlds in which the output of the algorithm you’re running is different. Other than for those correlations, FDT behaves in the same way as CDT.
Formally, where S represents states of nature, A, B etc represent acts, P is a probability function, and represents the utility the agent gains from the outcome of choosing A given state , and ‘≽’ represents the ‘at least as choiceworthy as’ relation:
Where ‘|’ represents conditional probability.
Where ‘∖’ is a ‘causal probability function’ that represents the decision-maker’s judgments about her ability to causally influence the events in the world by doing a particular action. Most often, this is interpreted in counterfactual terms (so P (S∖A) represents something like the probability of S coming about were I to choose A) but it needn’t be.
Where I introduce the operator “ † ” to represent the special sort of function that Yudkowsky and Soares propose, where P (S † A) represents the probability of S occurring were the output of the algorithm that the decision-maker is running, in this decision situation, to be A. (I’m not claiming that it’s clear what this means. E.g. seehere, second bullet point, arguing there can be no such probability function, because any probability function requires certainty in logical facts and all their entailments. I also note that strictly speaking FDT doesn’t assess acts in the same sense that CDT assesses acts; rather it assesses algorithmic outputs, and that Y&S have a slightly different formal set up than the one I describe above. I don’t think this will matter for the purposes of this note, though.)
With these definitions on board, we can turn to objections to FDT.
III. FDT sometimes makes bizarre recommendations
The criterion that Y&S regard as most important in assessing a decision theory is ‘amount of utility achieved’. I think that this idea is importantly underspecified (which I discuss more in section VII), but I agree with the spirit of it. But FDT does very poorly by that criterion, on any precisification of it.
In particular, consider the following principle:
Guaranteed Payoffs: In conditions of certainty — that is, when the decision-maker has no uncertainty about what state of nature she is in, and no uncertainty about the utility payoff of each action is — the decision-maker should choose the action that maximises utility.
That is: for situations where there’s no uncertainty, we don’t need to appeal to expected utility theory in any form to work out what we ought to do. You just ought to do whatever will give you the highest utility payoff. This should be a constraint on any plausible decision theory. But FDT violates that principle.
Consider the following case:
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.
A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?
The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left. In contrast, the right action on CDT or EDT is to take Right.
The recommendation is implausible enough. But if we stipulate that in this decision-situation the decision-maker is certain in the outcome that her actions would bring about, we see that FDT violates Guaranteed Payoffs.
(One might protest that no good Bayesian would ever have credence 1 in an empirical proposition. But, first, that depends on what we could as ‘evidence’ — if a proposition is part of your evidence base, you have credence 1 in it. And, second, we could construct very similar principles to Guaranteed Payoffs that don’t rely on the idea of certainty, but on approximations to certainty.)
Note that FDT’s recommendation in this case is much more implausible than even the worst of the prima facie implausible recommendations of EDT or CDT. So, if we’re going by appeal to cases, or by ‘who gets more utility’, FDT is looking very unmotivated.
IV. FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it
On FDT, you consider what things would look like in the closest (logically impossible) world in which the algorithm you are running were to produce a different output than what it in fact does. Because, so the argument goes, in Newcomb problems the predictor is also running your algorithm, or a ‘sufficiently similar’ algorithm, or a representation of your algorithm, you consider the correlation between your action and the predictor’s prediction (even though you don’t consider other sorts of correlations.)
However, the predictor needn’t be running your algorithm, or have anything like a representation of that algorithm, in order to predict whether you’ll one box or two-box. Perhaps the Scots tend to one-box, whereas the English tend to two-box. Perhaps the predictor knows how you’ve acted prior to that decision. Perhaps the Predictor painted the transparent box green, and knows that’s your favourite colour and you’ll struggle not to pick it up. In none of these instances is the Predictor plausibly doing anything like running the algorithm that you’re running when you make your decision. But they are still able to predict what you’ll do. (And bear in mind that the Predictor doesn’t even need to be very reliable. As long as the Predictor is better than chance, a Newcomb problem can be created.)
In fact, on the vast majority of ways that the Predictor could predicting your behavior, she isn’t running the algorithm that you are running, or representing it. But if the Predictor isn’t running the algorithm that you are running, or representing it, then, on the most natural interpretation, FDT will treat this as ‘mere statistical correlation’, and therefore act like CDT. So, in the vast majority of Newcomb cases, FDT would recommend two-boxing. But the intuition in favour of one-boxing in Newcomb cases was exactly what was supposed to motivate FDT in the first place.
Could we instead interpret FDT, such that it doesn’t have to require the Predictor to be running the exact algorithm — some similar algorithm would do? But I’m not sure how that would help: in the examples given above, the Predictor’s predictions aren’t based on anything like running your algorithm. In fact, the predictor may know very little about you, perhaps only whether you’re English or Scottish.
One could suggest that, even though the Predictor is not running a sufficiently similar algorithm to you, nonetheless the Predictor’s prediction is subjunctively dependent on your decision (in the Y&S sense of ‘subjunctive’). But, without any account of Y&S’s notion of subjunctive counterfactuals, we just have no way of assessing whether that’s true or not. Y&S note that specifying an account of their notion of counterfactuals is an ‘open problem,’ but the problem is much deeper than that. Without such an account, it becomes completely indeterminate what follows from FDT, even in the core examples that are supposed to motivate it — and that makes FDT not a new decision theory so much as a promissory note.
Indeed, on the most plausible ways of cashing this out, it doesn’t give the conclusions that Y&S would want. If I imagine the closest world in which 6288 + 1048 = 7336 is false (Y&S’s example), I imagine a world with laws of nature radically unlike ours — because the laws of nature rely, fundamentally, on the truths of mathematics, and if one mathematical truth is false then either (i) mathematics as a whole must be radically different, or (ii) all mathematical propositions are true because it is simple to prove a contradiction and every proposition follows from a contradiction. Either way, when I imagine worlds in which FDT outputs something different than it in fact does, then I imagine valueless worlds (no atoms or electrons, etc) — and this isn’t what Y&S are wanting us to imagine.
Alternatively (as Abram Demski suggested to me in a comment), Y&S could accept that the decision-maker should two-box in the cases given above. But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.
V. Implausible discontinuities
A related problem is as follows: FDT treats ‘mere statistical regularities’ very differently from predictions. But there’s no sharp line between the two. So it will result in implausible discontinuities. There are two ways we can see this.
First, take some physical processes S (like the lesion from the Smoking Lesion) that causes a ‘mere statistical regularity’ (it’s not a Predictor). And suppose that the existence of S tends to cause both (i) one-boxing tendencies and (ii) whether there’s money in the opaque box or not when decision-makers face Newcomb problems. If it’s S alone that results in the Newcomb set-up, then FDT will recommending two-boxing.
But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing. But this seems arbitrary — why should the fact that S’s causal influence on whether there’s money in the opaque box or not go via another agent much such a big difference? And we can think of all sorts of spectrum cases in between the ‘mere statistical regularity’ and the full-blooded Predictor: What if the ‘predictor’ is a very unsophisticated agent that doesn’t even understand the implications of what they’re doing? What if they only partially understand the implications of what they’re doing? For FDT, there will be some point of sophistication at which the agent moves from simply being a conduit for a causal process to instantiating the right sort of algorithm, and suddenly FDT will switch from recommending two-boxing to recommending one-boxing.
Second, consider that same physical process S, and consider a sequence of Newcomb cases, each of which gradually make S more and more complicated and agent-y, making it progressively more similar to a Predictor making predictions. At some point, on FDT, there will be a point at which there’s a sharp jump; prior to that point in the sequence, FDT would recommend that the decision-maker two-boxes; after that point, FDT would recommend that the decision-maker one-boxes. But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.
VI. FDT is deeply indeterminate
Even putting the previous issues aside, there’s a fundamental way in which FDT is indeterminate, which is that there’s no objective fact of the matter about whether two physical processes A and B are running the same algorithm or not, and therefore no objective fact of the matter of which correlations represent implementations of the same algorithm or are ‘mere correlations’ of the form that FDT wants to ignore. (Though I’ll focus on ‘same algorithm’ cases, I believe that the same problem would affect accounts of when two physical processes are running similar algorithms, or any way of explaining when the output of some physical process, which instantiates a particular algorithm, is Y&S-subjunctively dependent on the output of another physical process, which instantiates a different algorithm.)
To see this, consider two calculators. The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not? Well, perhaps on this foreign calculator the ‘–’ symbol means what we usually take it to mean — namely, that the ensuing number is negative — and therefore every time we hit the ‘=’ button on the second calculator we are asking it to run the algorithm ‘compute the sum entered, then output the negative of the answer’. If so, then the calculators are systematically running different algorithms.
But perhaps, in this foreign land, the ‘–’ symbol, in this context, means that the ensuing number is positive and the lack of a ‘–’ symbol means that the number is negative. If so, then the calculators are running exactly the same algorithms; their differences are merely notational.
Ultimately, in my view, all we have, in these two calculators, are just two physical processes. The further question of whether they are running the same algorithm or not depends on how we interpret the physical outputs of the calculator. There is no deeper fact about whether they’re ‘really’ running the same algorithm or not. And in general, it seems to me, there’s no fact of the matter about which algorithm a physical process is implementing in the absence of a particular interpretation of the inputs and outputs of that physical process.
But if that’s true, then, even in the Newcomb cases where a Predictor is simulating you, it’s a matter of choice of symbol-interpretation whether the predictor ran the same algorithm that you are now running (or a representation of that same algorithm). And the way you choose that symbol-interpretation is fundamentally arbitrary. So there’s no real fact of the matter about whether the predictor is running the same algorithm as you. It’s indeterminate how you should act, given FDT: you should one-box, given one way of interpreting the inputs and outputs of the physical process the Predictor is running, but two-box given an alternative interpretation.
Now, there’s a bunch of interesting work on concrete computation, trying to give an account of when two physical processes are performing the same computation. The best response that Y&S could to make this problem is to provide a compelling account of when two physical processes are running the same algorithm that gives them the answers they want. But almost all accounts of computation in physical processes have the issue that very many physical processes are running very many different algorithms, all at the same time. (Because most accounts rely on there being some mapping from physical states to computational states, and there can be multiple mappings.) So you might well end up with the problem that in the closest (logically impossible) world in which FDT outputs something other than what it does output, not only do the actions of the Predictor change, but so do many other aspects of the world. For example, if the physical process underlying some aspect of the US economy just happened to be isomorphic with FDT’s algorithm, then in the logically impossible world where FDT outputs a different algorithm, not only does the predictor act differently, but so does the US economy. And that will probably change the value of the world under consideration, in a way that’s clearly irrelevant to the choice at hand.
VII. But FDT gets the most utility!
Y&S regard the most important criterion to be ‘utility achieved’, and thinks that FDT does better than all its rivals in this regard. Though I agree with something like the spirit of this criterion, its use by Y&S is unhelpfully ambiguous. To help explain this, I’ll go on a little detour to present some distinctions that are commonly used by academic moral philosophers and, to a lesser extent, decision theorists. (For more on these distinctions, see Toby Ord’s DPhil thesis.)
Evaluative focal points
An evaluative focal point is an object of axiological or normative evaluation. (‘Axiological’ means ‘about goodness/badness’; ‘normative’ means ‘about rightness/wrongness’. If you’re a consequentialist, x is best iff it’s right, but if you’re a non-consequentialist the two can come apart.) When doing moral philosophy or decision theory, the most common evaluative focal points are acts, but we can evaluate other things too: characters, motives, dispositions, sets of rules, beliefs, and so on.
Any axiological or normative theory needs to specify which focal point it is evaluating. The theory can evaluate a single focal point (e.g. act utilitarianism, which only evaluates acts) or many (e.g. global utilitarianism, which evaluates everything).
The theory can also differ on whether it is direct or indirect with respect to a given evaluative focal point. For example, Hooker’s rule-consequentialism is a direct theory with respect to sets of rules, and an indirect theory with respect to acts: it evaluates sets of rules on the basis of their consequences, but evaluates acts with respect to how they conform to those sets of rules. Because of this, on Hooker’s view, the right act need not maximize good consequences.
Criterion of rightness vs decision procedure
In chess, there’s a standard by which it is judged who has won the game, namely, the winner is whoever first puts their opponent’s king into checkmate. But relying solely on that standard of evaluation isn’t going to go very well if you actually want to win at chess. Instead, you should act according to some other set of rules and heuristics, such as: “if white, play e4 on the first move,” “don’t get your Queen out too early,” “rooks are worth more than bishops” etc.
A similar distinction can be made for axiological or normative theories. The criterion of rightness, for act utilitarianism, is, “The right actions are those actions which maximize the sum total of wellbeing.” But that’s not the decision procedure one ought to follow. Instead, perhaps, you should rely on rules like ‘almost never lie’, ‘be kind to your friends and family’, ‘figure out how much you can sustainably donate to effective charities, and do that,’ and so on.
For some people, in fact, learning that utilitarianism is true will cause one to be a worse utilitarian by the utilitarian’s criterion of rightness! (Perhaps you start to come across as someone who uses others as means to an end, and that hinders your ability to do good.) By the utilitarian criterion of rightness, someone could in principle act rightly in every decision, even though they have never heard of utilitarianism, and therefore never explicitly tried to follow utilitarianism.
These distinctions and FDT
From Y&S, it wasn’t clear to me whether FDT is really meant to assess acts, agents, characters, decision procedures, or outputs of decision procedures, and it wasn’t clear to me whether it is meant to be a direct or an indirect theory with respect to acts, or with respect to outputs of decision procedures. This is crucial, because it’s relevant to which decision theory ‘does best at getting utility’.
With these distinctions in hand, we can see that Y&S employ multiple distinct interpretations of their key criterion. Sometimes, for example, Y&S talk about how “FDT agents” (which I interpret as ‘agents who follow FDT to make decisions’) get more utility, e.g.:
“Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem.”
“We propose an entirely new decision theory, functional decision theory (FDT), that maximizes agents’ utility more reliably than CDT or EDT.”
“FDT agents attain high utility in a host of decision problems that have historically proven challenging to CDT and EDT: FDT outperforms CDT in Newcomb’s problem; EDT in the smoking lesion problem; and both in Parfit’s hitchhiker problem.”
“It should come as no surprise that an agent can outperform both CDT and EDT as measured by utility achieved; this has been known for some time (Gibbard and Harper 1978).”
“Expanding on the final argument, proponents of EDT, CDT, and FDT can all
agree that it would be great news to hear that a beloved daughter adheres to FDT, because FDT agents get more of what they want out of life. Would it not then be strange if the correct theory of rationality were some alternative to the theory that produces the best outcomes, as measured in utility? (Imagine hiding decision theory textbooks from loved ones, lest they be persuaded to adopt the “correct” theory and do worse thereby!) We consider this last argument—the argument from utility—to be the one that gives the precommitment and value-of-information arguments their teeth. If self- binding or self-blinding were important for getting more utility in certain scenarios, then we would plausibly endorse those practices. Utility has primacy, and FDT’s success on that front is the reason we believe that FDT is a more useful and general theory of rational choice.”
Sometimes Y&S talk about how different decision theories produce more utility on average if they were to face a specific dilemma repeatedly:
“Measuring by utility achieved on average over time, CDT outperforms EDT in some well-known dilemmas (Gibbard and Harper 1978), and EDT outperforms CDT in others (Ahmed 2014b).”
“Imagine an agent that is going to face first Newcomb’s problem, and then the smoking lesion problem. Imagine measuring them in terms of utility achieved, by which we mean measuring them by how much utility we expect them to attain, on average, if they face the dilemma repeatedly. The sort of agent that we’d expect to do best, measured in terms of utility achieved, is the sort who one-boxes in Newcomb’s problem, and smokes in the smoking lesion problem.”
Sometimes Y&S talk about which agent will achieve more utility ‘in expectation’, though they don’t define the point at which they gain more expected utility (or what notion of ‘expected utility’ is being used):
“One-boxing in the transparent Newcomb problem may look strange, but it works. Any predictor smart enough to carry out the arguments above can see that CDT and EDT agents two-box, while FDT agents one-box. Followers of CDT and EDT will therefore almost always see an empty box, while followers of FDT will almost always see a full one. Thus, FDT agents achieve more utility in expectation.”
Sometimes they talk about how much utility ‘decision theories tend to achieve in practice’:
“It is for this reason that we turn to Newcomblike problems to distinguish between the three theories, and demonstrate FDT’s superiority, when measuring in terms of utility achieved.”
“we much prefer to evaluate decision theories based on how much utility they tend to achieve in practice.”
Sometimes they talk about how well the decision theory does in a circumscribed class of cases (though they note in footnote 15 that they can’t define what this class of cases are):
“FDT does appear to be superior to CDT and EDT in all dilemmas where the agent’s beliefs are accurate and the outcome depends only on the agent’s behavior in the dilemma at hand. Informally, we call these sorts of problems “fair problems.””
“FDT, we claim, gets the balance right. An agent who weighs her options by imagining worlds where her decision function has a different output, but where logical, mathematical, nomic, causal, etc. constraints are otherwise respected, is an agent with the optimal predisposition for whatever fair dilemma she encounters.”
And sometimes they talk about how much utility the agent would receive in different possible worlds than the one she finds herself in:
“When weighing actions, Fiona simply imagines hypotheticals corresponding to those actions, and takes the action that corresponds to the hypothetical with higher expected utility—even if that means imagining worlds in which her observations were different, and even if that means achieving low utility in the world corresponding to her actual observations.”
As we can see, the most common formulation of this criterion is that they are looking for the decision theory that, if run by an agent, will produce the most utility over their lifetime. That is, they’re asking what the best decision procedure is, rather than what the best criterion of rightness is, and are providing an indirect account of the rightness of acts, assessing acts in terms of how well they conform with the best decision procedure.
But, if that’s what’s going on, there are a whole bunch of issues to dissect. First, it means that FDT is not playing the same game as CDT or EDT, which are proposed as criteria of rightness, directly assessing acts. So it’s odd to have a whole paper comparing them side-by-side as if they are rivals.
Second, what decision theory does best, if run by an agent, depends crucially on what the world is like. To see this, let’s go back to question that Y&S ask of what decision theory I’d want my child to have. This depends on a whole bunch of empirical facts: if she might have a gene that causes cancer, I’d hope that she adopts EDT; though if, for some reason, I knew whether or not she did have that gene and she didn’t, I’d hope that she adopts CDT. Similarly, if there were long-dead predictors who can no longer influence the way the world is today, then, if I didn’t know what was in the opaque boxes, I’d hope that she adopts EDT (or FDT); if I did know what was in the opaque boxes (and she didn’t) I’d hope that she adopts CDT. Or, if I’m in a world where FDT-ers are burned at the stake, I’d hope that she adopts anything other than FDT.
Third, the best decision theory to run is not going to look like any of the standard decision theories. I don’t run CDT, or EDT, or FDT, and I’m very glad of it; it would be impossible for my brain to handle the calculations of any of these decision theories every moment. Instead I almost always follow a whole bunch of rough-and-ready and much more computationally tractable heuristics; and even on the rare occasions where I do try to work out the expected value of something explicitly, I don’t consider the space of all possible actions and all states of nature that I have some credence in — doing so would take years.
So the main formulation of Y&S’s most important principle doesn’t support FDT. And I don’t think that the other formulations help much, either. Criteria of how well ‘a decision theory does on average and over time’, or ‘when a dilemma is issued repeatedly’ run into similar problems as the primary formulation of the criterion. Assessing by how well the decision-maker does in possible worlds that she isn’t in fact in doesn’t seem a compelling criterion (and EDT and CDT could both do well by that criterion, too, depending on which possible worlds one is allowed to pick).
Fourth, arguing that FDT does best in a class of ‘fair’ problems, without being able to define what that class is or why it’s interesting, is a pretty weak argument. And, even if we could define such a class of cases, claiming that FDT ‘appears to be superior’ to EDT and CDT in the classic cases in the literature is simply begging the question: CDT adherents claims that two-boxing is the right action (which gets you more expected utility!) in Newcomb’s problem; EDT adherents claims that smoking is the right action (which gets you more expected utility!) in the smoking lesion. The question is which of these accounts is the right way to understand ‘expected utility’; they’ll therefore all differ on which of them do better in terms of getting expected utility in these classic cases.
Finally, in a comment on a draft of this note, Abram Demski said that: “The notion of expected utility for which FDT is supposed to do well (at least, according to me) is expected utility with respect to the prior for the decision problem under consideration.” If that’s correct, it’s striking that this criterion isn’t mentioned in the paper. But it also doesn’t seem compelling as a principle by which to evaluate between decision theories, nor does it seem FDT even does well by it. To see both points: suppose I’m choosing between an avocado sandwich and a hummus sandwich, and my prior was that I prefer avocado, but I’ve since tasted them both and gotten evidence that I prefer hummus. The choice that does best in terms of expected utility with respect to my prior for the decision problem under consideration is the avocado sandwich (and FDT, as I understood it in the paper, would agree). But, uncontroversially, I should choose the hummus sandwich, because I prefer hummus to avocado.
VIII. An alternative approaches that captures the spirit of FDT’s aims
Academic decision theorists tends to focus on what actions are rational, but not talk very much about what sort of agent to become. Something that’s distinctive and good about the rationalist community’s discussion of decision theory is that there’s more of an emphasis on what sort of agent to be, and what sorts of rules to follow.
But this is an area where we can eat our cake and have it. There’s nothing to stop us assessing agents, acts and anything else we like in terms of our favourite decision theory.
Let’s define: Global expected utility theory =df for any x that is an evaluative focal point, the right x is that which maximises expected utility.
I think that Global CDT can get everything we want, without the problems that face FDT. Consider, for example, the Prisoner’s Dilemma. On the global version of CDT, we can say both that (i) the act of defecting is the right action (assuming that the other agent will use their money poorly); and that (ii) the right sort of person to be is one who cooperates in prisoner’s dilemmas.
(ii) would be true, even though (i) is true, if you will face repeated prisoner’s dilemmas, if whether or not you find yourself in opportunities to cooperate depend on whether or not you’ve cooperated in the past, if other agents can tell what sort of person you are even independently in your actions in Prisoner’s Dilemmas, and so on. Similar things can be said about blackmail cases and about Parfit’s Hitchhiker. And similar things can be said more broadly about what sort of person to be given consequentialism — if you become someone who keeps promises, doesn’t tell lies, sticks up for their friends (etc), and who doesn’t analyse these decisions in consequentialist terms, you’ll do more good than someone who tries to apply the consequentialist criterion of rightness for every decision.
(Sometimes behaviour like this is described as ‘rational irrationality’. But I don’t think that’s an accurate description. It’s not that one and the same thing (the act) is both rational and irrational. Instead, we continue to acknowledge that the act is the irrational one; we just also acknowledge that it results from the rational disposition to have.)
There are other possible ways of capturing some of the spirit of FDT, such as a sort of rule-consequentialism, where the right set of rules to follow are those that would produce the best outcome if all agents followed those rules, and the right act is that which conforms to that set of rules. But I think that global causal decision theory is the most promising idea in this space.
In this note, I argued that FDT faces multiple major problems. In my view, these are fatal to FDT in its current form. I think it’s possible that, with very major work, a version of FDT could be developed that could overcome some of these problems (in particular, the problems described in sections IV, V and VI, that are based, in one way or another, on the issue of when two processes are Y&S-subjunctively dependent on one another). But it’s hard to see what the motivation for doing so is: FDT in any form will violate Guaranteed Payoffs, which should be one of the most basic constraints on a decision theory; and if, instead, we want to seriously undertake the project of what decision-procedure is the best for an agent to run (or ‘what should we code into an AI?’), the answer will be far messier, and far more dependent on particular facts about the world and the computational resources of the agent in question, than any of EDT, CDT or FDT.