Functional Decision Theory: Not Even Wrong, Also Wrong

Crosspost.

1 The analytic philosophers vs the rationalists

A lot of analytic philosophers are sympathetic to Rationalism (the social movement, not the alternative to empiricism). I don’t know if I’m senior enough yet to count as a philosopher, but I certainly count myself as among those sympathetic. Yet virtually all of them have the same complaint: Rationalists very often make philosophical errors, especially when it comes to decision theory.

The Rationalist community, for those unaware, is a group devoted to forming beliefs rationally. They disproportionately live in the Bay Area, post on LessWrong, think AI is going to be a big deal, adopt various reductionist philosophical views, etc. I’ve written about my thoughts about the Rationalists here—they’re very smart and interesting; they get a lot right but are sometimes overconfident and wrong about philosophy.

The Rationalist decision theory du jour is called functional decision theory (FDT). Academic decision theorists don’t like the theory. The number of academic decision theorists who adopt it could be counted on one hand by someone missing four of their fingers.1 If some tragedy occurred causing you to lose all your fingers and Ben Levinstein to lose his life, you’d still be able to count the number of academic decision theorists who endorse FDT on one hand. My position on the view is as simple as can be: I think the view is definitely wrong. It both is sufficiently underspecified so as to give no real recommendations, and also the recommendations that it supposedly gives are extremely implausible on their face.

I have had debates with about 5 million Rationalists on this subject. Half my time in the bay area was spent arguing with people about decision theory. When I sleep, I am haunted by the ghosts of FDTers. If you keep saying some point over and over again, it sometimes makes sense to write it up. I thought I’d do that. But if you want to read more from other people who are better at decision theory than me, and also more sensible and measured, read Will MacAskill’s great piece and also Wolfgang Schwarz’s piece.

2 What the heck is FDT?

(Skip this section if you know what each of the main decision theories are).

Decision theories tell you how to get what you want. Specifically, they tell you how to reason about cases where different options get you different amounts of what you want (the amounts of what you want are measured in units of utility. This doesn’t have anything to do with utilitarianism the moral theory—it just denotes the amounts of whatever it is that you’re optimizing for).

There are two major decision theories that academic philosophers like. One is called causal decision theory (CDT). I’m trying to be impartial, so I won’t tell you that it’s (probably) the correct view. It says that you should take the action that causes you to have the most utility. Specifically, it says that when taking an action, you should ignore non-causal influences that your actions might have on the state of the world and only do what causes the best thing.

There’s a second view called evidential decision theory (EDT). It says that you should take the action which leaves you with the expectation of having the most utility. So when deciding between acts A and B, ask: how much utility would I expect to have if I take A? What about if I take B? If you’d expect to have more if you took A than B, then you should take A. If you’d expect to have more if you took B, then you should take B.

Functional decision theory is different from either. It says you should think of your action as determining the outcome of your decision algorithm. You should take the act which is such that across time, you expect to get the most utility if your algorithm outputs that act.

So EDT asks: what action leaves me with the expectation that I’ll be richest? CDT asks: what action causes me to be the richest? And FDT asks: what action would my algorithm outputting make me expect to be the richest if it was settled at the start of time?

Here’s a famous case to distinguish the theories. It’s called Newcomb’s problem. It’s the most famous dilemma in decision theory.

There are two boxes, A and B. You have the option of either taking just A or both A and B. B has $1,000. One hour ago, a very accurate predictor guessed whether you would take both boxes or just box A. If he predicted you would just take box A, he put $2,000 in box A. If he predicted you’d take both boxes, he put nothing in box A.

Question: should you take both boxes or just box A?

CDTers say: both boxes. Taking the second box causes you to get an extra $1,000. The fact that it correlates with there being less money in the box is irrelevant. By taking one box, CDTers claim, you’re just passing up an extra $1,000.

EDTers say: just one box. If you take just the first box, you’ll generally end up with $2,000 instead of $1,000. EDTers say: you expect to end up with more money if you take one box, so you should take one box!

FDTers say: it depends on how the predictor predicts what you’ll do. Suppose they run your algorithm or an algorithm very much like yours to predict what you’ll do. Well then, by changing the results of your algorithm, you change their prediction. So then you should one-box. The output of your algorithm, then, determines how much money is in the box—FDT thinks of your decisions as determining the results of your algorithm.

But suppose instead that they make predictions by looking at some other characteristic that merely correlates with one-boxing. E.g. maybe they look at whether you had a professor that two-boxed. In this case, FDT says you should two box. The predictor isn’t running your algorithm, so changing the outcome of the algorithm doesn’t change what is in the box.

So what’s wrong with FDT? I have two main gripes: what FDT says is wildly underspecified—there’s no remotely plausible way to fill in the details. Also, the few judgments that FDT supposedly gives are often wildly implausible!

3 FDT doesn’t say anything

The biggest problem with FDT is that it is devoid of genuine content.

3.1 Is there a fact about how other functions would be different in the impossible world where mine was?

FDT says that when taking an action, you should consider how the world would be if your decision procedure gave some recommendation. But what does that mean? Specifically, suppose that you are kind of like me but different in a bunch of respects. Maybe you’re my brother. Maybe you’re Claude Opus 4.7 and I’m Claude Opus 4.6. Maybe you’re an almost exact copy of me. Does changing my algorithm change your algorithm? How could we possibly answer this question?

Remember, my decision algorithm is some mathematical function. So we’re asked to imagine in the mathematically impossible world where some math function outputted something different from what it mathematically has to output, whether other mathematical functions would be different. What could this mean? How could there possibly be an answer to this question? How can you have a theory that depends on there being determinate answers to the question: in the logically impossible world where some necessary mathematical fact was different, how would other necessary mathematical facts be different? What?

FDTers often claim that CDT requires considering counterpossibles too, because it instructs you to hold fixed what the world is independent of your choice and then make the decision that maximizes utility with respect to that. Now, even if this is right, it’s a lot sketchier to consider how other algorithms would be different in counterpossible worlds than just considering irrelevant features of generic counterpossibles. But CDT holds fixed only which things causally depend on your act, not the initial conditions. So it never has to consider a situation where, say, the initial conditions determine that you’ll take some act A, yet you take act B. As Wolfgang Schwarz put it:

For another example, Yudkowsky and Soares claim that CDT (like FDT) involves evaluating logically impossible scenarios. For example, “[CDTers] are asking us to imagine the agent’s physical action changing while holding fixed the behavior of the agent’s decision function”. Who says that? I would have thought that when we consider what would happen if you took one box in Newcomb’s Problem, the scenario we’re considering is one in which your decision function outputs one-boxing. We’re not considering an impossible scenario in which your decision function outputs two-boxing, you have complete control over your behaviour, and yet you choose to one-box. There are many detailed formulations of CDT. Yudkowsky and Soares ignore almost all of them and only mention the comparatively sketchy theory of Pearl. But even Pearl’s theory plausibly doesn’t appeal to impossible propositions to evaluate ordinary options. Lewis’s or Joyce’s or Skyrms’s certainly doesn’t.

And note: this isn’t just some minor quibble with what FDT says in a few cases. This is the core mechanic of FDT. This is what FDT needs to generate a single result in a single case! Every case where FDT gives a recommendation, it does so by analyzing the counterfactual where the output of a mathematical function was different. Insofar as there’s no fact of the matter about that, FDT doesn’t give any recommendations in any cases.

Let’s apply this to Newcomb’s problem. Suppose the predictor predicts what I’ll do by running an algorithm. Presumably it won’t be exactly the same algorithm as the one I’m employing. He’s not running an exact mental simulation of me even if his simulation reliably correlates with what I’ll do. Suppose my algorithm will in fact output one-boxing. FDT requires we answer: in the logically impossible world where my algorithm outputted two-boxing, would the predictor’s algorithm output two-boxing? Clearly there’s no fact of the matter about that! So FDT doesn’t even get clear results in Newcomb’s problem! As long as the predictor isn’t running an exact simulation of you, FDT falls silent on the question of what you should do.

3.2 Statistical correlations aren’t enough

Now, there’s an obvious-sounding solution to this problem. Just consider the nearest epistemically possible world where your decision theory outputs some recommendation, and then tabulate the amount of utility you expect to get. So suppose that you learned that your algorithm was disposed to two-box. Then ask: how much money would you expect to get. Compare that to how much you’d expect to get if you learned your algorithm was disposed to one-box. If you’d expect to have more after learning your algorithm one-boxes than two-boxes, then you should one-box.

But this obvious-sounding solution doesn’t work. It makes the theory into updateless EDT.2 To see this, imagine that some people are born with a gene that correlates heavily with two-boxing. The predictor predicts what I’ll do by looking at whether I have the gene. Two-boxing doesn’t cause the gene or affect whether you have the gene in any way. This solution would recommend one-boxing in this case. If I knew that my algorithm was disposed to one-box, I’d have a high credence in my having the gene, and in my getting rich. But FDT isn’t supposed to say that!

In fact, this leaves FDT vulnerable to the very smoker’s lesion result that FDTers take to be decisive against EDT. Imagine that smoking doesn’t cause your health to be worse. Instead, smoking correlates with having a lesion on your lung that both makes you likelier to smoke and makes your health worse. It seems rational to smoke, because smoking has no effect on whether you have the lesion on your lung. Yet if your algorithm outputs smoking, that makes you expect that you have the lesion, and so it lowers the expected utility that you get according to this solution.

Now, you could modify the view once again so that you only analyze your expectations concerning other algorithms. This way, you wouldn’t look at how much utility you’d expect to get if your algorithm outputted some action. Or, at the very least, you wouldn’t take the action which, if your algorithm outputs, leaves you with the highest amount of expected utility. Instead, when deciding between two actions A and B, you’d imagine:

  1. Your algorithm outputting A vs B.

  2. What you expect other algorithms to output if yours outputs A vs if yours outputs B.

  3. Then you count up the utility from you and other algorithms outputting A vs B. Whichever one leaves you with more utility timelessly (we’ll come back to the timeless thing later) is the one you take.

That way, you only analyze your algorithm’s probabilistic impact on other algorithms. Whether you have lung cancer is not an algorithm. So you don’t treat your algorithm being different as affecting it in the way relevant to decision making.

But this is of no help. Imagine a modified case where the lesion doesn’t make your health worse. Instead, there’s an algorithm that checks to see if you have the lesion. If you do, then it makes your health worse and also makes you likelier to smoke. Now there’s an algorithm in the mix, so this view is back to thinking (wrongly, and contrary to the spirit of FDT) that you shouldn’t smoke. After all, your algorithm outputting “don’t smoke” makes you expect that the other algorithm output “is less likely to smoke and has better health.”

So now the FDTers are in pretty rough shape. They need to have some account of how your algorithm outputting A would affect other different algorithms. But this can’t just be about your credence in the other algorithm having some outcome, conditional on yours outputting A. FDT depends on analyzing how your action being different (counterpossibly) would make other algorithms different (counterpossibly) without looking at how likely other algorithms would be different in the nearest epistemically possible world where yours is different. How could there possibly be a satisfying solution to this problem?

What it needs is some precise specification of how similar two algorithms are that doesn’t depend on:

  1. Extraneous factors (e.g. you won’t want to say that how much my algorithm being different affects other agents running algorithms depends on whether they and I make similar jokes).

  2. The degree of correlation between ultimate decisions.

But what could it possibly depend on? Isn’t it obvious that there’s no single privileged joint-carving way to decide the similarity of algorithms that doesn’t just look at the statistical correlation between their outputs? Certainly FDTers owe us some account of how this works. It doesn’t do to call it an unsolved problem, when this is the entire engine of the theory—when there’s no plausible story of what a solution would even look like, strong active reason to think there is no such solution, and a solution is needed for the theory to give any result in any case.3

Let’s be a bit more concrete. Imagine that I’m in a prisoner’s dilemma against my twin (note that my twin isn’t exactly like me but is similar). I understand having a credence in my twin cooperating conditional on my cooperating. But if we’re not talking about conditional credences, how could there be a uniquely privileged sharp fact about the non-statistical algorithmic correlation between us two?

3.3 Perfectly correlated algorithms

FDT has another pretty bad result in this vicinity. Imagine that there’s some gene that correlates 99.9% with two-boxing. The gene is not caused by two-boxing, they just perfectly correlate. Now imagine two different scenarios:

  1. The predictor looks to see if you have the gene. If you don’t, they put $3,000 in the first box. If you do, they put nothing in the first box. The second box has $1,000. Should you take both boxes?

  2. The predictor runs a simulation of you with 99.9% accuracy. The cases where the simulation is inaccurate are the same as the ones where there isn’t an overlap between your gene and which box you take. Thus, there is 100% overlap between the predictor’s judgment in this case and the last. The only difference is that in the last case, they look to see whether you have the gene, while in this case, they run a simulation of you. If they guess that you’ll one-box, they put $3,000 in box one, while if they guess you’ll two-box, they put nothing. The second box has $3,000. Should you take both boxes?

Here FDT’s answer is that you should two-box in the first case but not in the second case. But that’s very implausible. It runs afoul of the following principle:

Equivalent predictions: if there are two methods of prediction that always output the same predictions, your answer in Newcomb’s problem shouldn’t depend on which one was employed.

This principle strikes me as very obvious. One reason for this is that if two predictive algorithms always overlap, then when you know how one turned out, you also know how the other one would have turned out. But if you know how they’d both have turned out, then surely it doesn’t matter which one they actually used.

To see this, imagine that both kinds of predictors are employed. Then, they both send a signal that’s used to influence how much money is in the box. Whichever signal arrives first determines the amount of money in the box. Surely it doesn’t matter which arrives first? Because the two predictive methods always output the same thing, this has no bearing at all on the amount of money in the box!

3.4 No fact about whether two algorithms are the same

Things get even worse. How do we determine if two functions are running the same algorithm? I’m told this is an “unsolved problem” for FDT. There seem to be a lot of those. And remember, you can’t just look at whether they always output the same thing, because FDT distinguishes between mere correlations and paired algorithms. As Will MacAskill put it in his piece:

Even putting the previous issues aside, there’s a fundamental way in which FDT is indeterminate, which is that there’s no objective fact of the matter about whether two physical processes A and B are running the same algorithm or not, and therefore no objective fact of the matter of which correlations represent implementations of the same algorithm or are ‘mere correlations’ of the form that FDT wants to ignore.

To see this, consider two calculators. The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not? Well, perhaps on this foreign calculator the ‘–’ symbol means what we usually take it to mean — namely, that the ensuing number is negative — and therefore every time we hit the ‘=’ button on the second calculator we are asking it to run the algorithm ‘compute the sum entered, then output the negative of the answer’. If so, then the calculators are systematically running different algorithms.

But perhaps, in this foreign land, the ‘–’ symbol, in this context, means that the ensuing number is positive and the lack of a ‘–’ symbol means that the number is negative. If so, then the calculators are running exactly the same algorithms; their differences are merely notational.

Ultimately, in my view, all we have, in these two calculators, are just two physical processes. The further question of whether they are running the same algorithm or not depends on how we interpret the physical outputs of the calculator.

Now, as Will notes, standard attempts to measure whether two algorithms are the same generally imply that one system may run many different algorithms simultaneously. If the ultimate account has to do with the mapping between inputs and outputs, then changing the output of your algorithm may have bizarre effects on other features of the world. As Will writes:

For example, if the physical process underlying some aspect of the US economy just happened to be isomorphic with FDT’s algorithm, then in the logically impossible world where FDT outputs a different algorithm, not only does the predictor act differently, but so does the US economy. And that will probably change the value of the world under consideration, in a way that’s clearly irrelevant to the choice at hand.

There’s a related problem. Suppose that there is someone who is psychologically identical to me at all times before Newcomb’s problem. In Newcomb’s problem, they one-box. Should we think of changing the results of my “algorithm” as changing the results of theirs? What could possibly determine this?

There’s a somewhat strange paradox here. Imagine that there’s someone who is psychologically identical to me at all times before the prisoner’s dilemma. I’m in a prisoner’s dilemma against them. They defect. On FDT, I should defect too. But then we’re running the same algorithm. So then I should cooperate. But then we’re running different algorithms, so I should defect.

Now, you might object that the scenario, as I’ve described, is impossible. If I’m basing my decision on theirs, then we can’t be running the exact same algorithm. Here we should imagine that my decision is not based on theirs. We should then consider the question: what action do I have most objective reason to do (instead of which one is best for me to do given what I know).

3.5 Conclusion

So let’s recap. FDT needs a solution to each of the following to give almost any judgment in almost any case:

  1. Determining how one algorithm being different would affect another algorithm being different, without depending on the epistemic probability of the second being different if the first was. This also can’t depend on extraneous factors. Remember additionally that we are imagining how other things would be different in the metaphysically impossible world where some mathematical fact is different, and we can’t just rely on epistemic probabilities! This problem seems obviously fatal.

  2. Determining whether two algorithms are the same. There is no standard way of doing this, and there are deep reasons to think that any solution to this would have bizarre implications—e.g. on unrelated algorithms that happen to be isomorphic.

Then, even if we had a solution to both of those, FDT would have the problem:

  1. It implies that even if two predictive processes are 100% correlated, it would matter which one was used in Newcomb-type problems.

  2. It generates a paradox in cases where an algorithm being the same as yours depends on what you do in some situation.

Absent a solution to the first two, FDT isn’t a theory. It’s a collection of suggestions. In every case that has ever arisen in the history of the species and all the standard thought experiments, it is wildly unclear what FDT says. There are deep reasons to think it doesn’t say anything.

4 Should you light yourself on fire for no benefit?

My answer is “no.” FDT’s answer is “yes.” Here’s the case (from Will MacAskill, though similar examples abound):

Bomb.

You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.

A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.

The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.

You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?

FDT says that you should slowly and painfully burn yourself to death. After all, having the disposition to do that makes you better off in expectation timelessly. It makes it so that probably there won’t be the bomb in the box in the first place, and you won’t have to pay $100.

But this just seems irrelevant. The bomb is in the box. I have no uncertainty about what will happen if I choose Left. In cases where you have no uncertainty about how the world is, where one action simply leaves you with less utility, you shouldn’t take that action. The fact that this case is rare doesn’t matter! It’s a crazy recommendation of FDT that it tells you to light yourself on fire when you know that if you do so, you will not benefit at all.

FDTers I’ve talked to sometimes have said this is unfairly rhetorically loaded. “It’s not for no benefit,” they claim. “The benefit comes from you being better off if your decision algorithm is disposed to make it.” But at the time you’re taking the act, you have no uncertainty about how the world is. You know what benefits will come about if you take the act: none. So this phrasing is accurate.

And there are an infinite number of other similar examples. Imagine that everyone in the world is put into a deep slumber. Then, the predictor simulates you and guesses if you’ll, thirty years after waking up, painfully cut off your leg for no benefit. The simulation is highly correlated with you, so his guesses about whether you’ll cut off your leg for no benefit are 99.9% accurate. If he predicts that you’ll slice off your leg, he wakes you up. If he predicts that you won’t, then he doesn’t. Assume that waking up is very good for you.

FDT implies that because being disposed to slice off your leg for no benefit makes you likelier to wake up, you should slice off your leg thirty years later. But that just seems crazy. At the time you’re making decisions, you’re already awake. If you’re already awake, it makes no sense to slice off your leg on grounds that it makes you likelier to be awake. The odds that you’re awake are already 100%.

Note: decision theories are theories about rationality. They tell you what decisions are wise and sensible to make. They are not theories about the desirable dispositions to have or about how you should program an AI. There are interesting questions about those sorts of things, but they aren’t what decision theory is about. So don’t think to yourself “would I be better off timelessly having the disposition to slice off my leg?” Think “is it rational, at the time I’m making the decision, to cut off my leg for no benefit.” I think the answer is clear: no! I’ll talk more about this distinction in the next section.

There’s a response to this that I’ve heard from a lot of functional decision theorists. Here’s the idea: you don’t really know if you are the algorithm being simulated in Newcomb’s problem or the actual person. For all you know, you might be the simulation, in which case you outputting “one-box” leads to more utility. I find this response very bizarre:

  1. I know I’m not a simulated algorithm. The simulated algorithm isn’t conscious (we can stipulate). I am.

  2. Decision theory generally assumes that you’re self-interested. But if I’m the algorithm, then I care about algorithm me—not the version in the real world. So then I wouldn’t care about what the output of the algorithm was.

Thus, I think FDT gives incorrect recommendations.

5 Does FDT get more utility?

A claim that FDTers are fond of making is that following FDT gets you the most utility. Take the version of Newcomb’s problem where the boxes are transparent, for example. So in this case, you can peer into both boxes and see how much money is in each. In this case, both CDT and EDT recommend taking two boxes. After all, at this point you have no uncertainty about how the world is—taking two boxes leaves you with an extra $1,000. FDTers recommend you take one box, because that timelessly leaves you with more utility.

Thus, FDTers generally leave transparent Newcomb’s problem richer than either EDTers or CDTers. FDT proponents claim that FDT “gets you more utility,” and is thus the right criterion of action. I have four problems with this argument.

First, I don’t think FDT says anything in any case because it’s not a complete theory (see section 2). If FDT says nothing, it can’t get you the most utility.

Second, FDT doesn’t always get you the most utility. For example, consider the following exotic possible world: the actual world. In this one, if you hang around academic philosophers, they will think you’re silly if you adopt FDT. This will make you sad. So adopting FDT gets you less utility. Additionally, in the actual world, I would get less utility if I were an FDTer, because I find it fun to argue with FDTers about decision-theory. Or imagine that the government passed a law where they tortured everyone who thought FDT was the right view. FDTers wouldn’t be better off.

Or imagine the following setup. You’re offered a box full of cash. A predictor predicts if you’d one-box or two-box in Newcomb’s problem. If you one-box, they put nothing in. If you two-box, they put a million dollars in. Now, suddenly, it’s the two-boxers who are rich.

These examples may seem unfair. You directly get rewarded based on things that are downstream of your decision theory. But Newcomb’s problem is also unfair in precisely this way. It ties how much money you get in a box to your judgments in a decision problem.

Now, you can get around this by narrowing the claim. You can say something like “FDT gets you most utility with respect to the utility that’s downstream of your decision algorithm.” But similarly, CDTers can claim “CDT gets you the most utility causally,” and EDTs can claim “EDT gets you the most utility evidentially.” The different theories disagree about what kind of utility is decision-theoretically relevant. So just pounding the table and saying “my theory is best by the lights of the criterion that my theory says is decision-theoretically relevant,” is obviously question-begging.

Third, FDT isn’t actually the theory that leaves you with most expected utility on average. In fact, in many cases, it’s EDT (perhaps updateless) that leaves you with the most expected utility. For example, in the smoker’s lesion case, EDTers tend to finish better-off than other people. In smoker’s lesion, smoking correlates with worse health, but it doesn’t cause it. But EDTers are less likely to smoke, so on average they’ll have better health.

Now, FDTers’ reply will presumably be that what matters isn’t just leaving with the most utility on average. Fair enough. But then they can’t appeal to this criterion. They don’t do best by it. Which kind of utility you get the most of can’t straightforwardly tell you which decision theory is right, because the decision theories disagree about which kind of utility matters.

Fourth, this argument begs the question in a different way. Other theories make a distinction between the disposition that are beneficial to have and the ones that are rational. For example, imagine that a highly reliable predictor checks to see if you’ll give into blackmail for $100. If so, then he blackmails you. If not, he doesn’t. In this case, non-FDT views grant that it’s timelessly better to not give into blackmail. They simply think that once you’re being blackmailed, the rational thing to do is to give in. At that time, you’re simply paying $100 to avoid having your life ruined.

Now, FDTers reject such a distinction. But we’ll need some argument against this distinction. Otherwise, this objection simply assumes that there’s no distinction between dispositions that are rational and ones that are beneficial. Non-FDTers have a perfectly sensible reply to this objection: in situations where you are directly rewarded for being irrational—for making some unwise decision—then of course the irrational people will be better off!

And non-FDTers have their own claim that their theory gets you the most utility. In, say, MacAskill’s bomb case, FDTers blow themselves up while CDTers and EDTers don’t. CDTers and EDTers thus leave with more utility when they’re in this situation.

Non-FDTers can grant: something FDTish might describe the kinds of dispositions you timelessly want to have, depending on how the world is. But that’s different from it being the right account of rationality. The dispositions that are beneficial aren’t necessarily the ones that are rational. Decision theories are theories of rationality, not of how to program an AI. If you are only interested in the question of how to program an AI, don’t purport to be giving a decision theory that is superior to the ones philosophers endorse.

6 Conclusion

FDT is both implausible and underbaked. It sometimes licenses setting yourself on fire for no benefit. It depends on analyzing how other algorithms would be different in the logically impossible world where your algorithm was different, but has no account of how to analyze logically impossible worlds, how to analyze what it means for your algorithm to be different, and how to analyze the impact that your algorithm being different has on other algorithms. This isn’t a minor technicality—it means that there is literally no situation where we can derive the correct answer from the theory.

Permit me to go slightly meta for a moment. Ideas like FDT are not unknown to academic philosophers. Various ideas in the vicinity have been proposed. Indeed, a view like FDT—where you one-box in Newcomb’s problem even if the boxes are transparent—is intuitive to a lot of people. But the view is pretty widely rejected because it doesn’t really hold up when you scrutinize it and filling in the details is very difficult. There’s a line by Scott Alexander that I sometimes think of:

My heuristic is that when the mainstream consensus refuses to engage with a critique and hem and haw about it being “problematic”, they are usually wrong. But when they explicitly declare “This is incorrect” and write papers explaining their reasoning, they are usually right.

The response from academic philosophers has been more in the direction of “write papers explaining their reasoning.” FDTers who think their theory is unfairly neglected by the experts need some explanation of why the academic philosophers who hear of FDT nearly always think it’s wrong.

Among laypeople who hear about decision theory, lots of them adopt something FDTish. So you need some explanation of why it is that nearly all the decision-theory experts—who write monstrously complicated papers with math that would go over your head—think FDT is wrong, but intro philosophy students who know almost nothing about decision theory think that it’s right. In general, you should be skeptical of views that are rejected by ~100% of relevant experts, even after considering them at length.