# A Defense of Functional Decision Theory

This post is an attempt to refute an article offering critique on Functional Decision Theory (FDT). If you’re new to FDT, I recommend reading this introductory paper by Eliezer Yudkowsky & Nate Soares (Y&S). The critique I attempt to refute can be found here: A Critique of Functional Decision Theory by wdmacaskill. I strongly recommend reading it before reading this response post.

The article starts with descriptions of Causal Decision Theory (CDT), Evidential Decision Theory (EDT) and FDT itself. I’ll get right to the critique of FDT in this post, which is the only part I’m discussing here.

## “FDT sometimes makes bizarre recommendations”

The article claims “FDT sometimes makes bizarre recommendations”, and more specifically, that FDT violates guaranteed payoffs. The following example problem, called Bomb, is given to illustrate this remark:

“Bomb.

You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it. A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty. The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left. You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?” The article answers and comments on the answer as follows: “The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself$100 by taking Left. In contrast, the right action on CDT or EDT is to take Right.

The recommendation is implausible enough. But if we stipulate that in this decision-situation the decision-maker is certain in the outcome that her actions would bring about, we see that FDT violates Guaranteed Payoffs.”

I agree FDT recommends taking the left box. I disagree that it violates some principle every decision theory should adhere to. Left-boxing really is the right decision in Bomb. Why? Let’s ask ourselves the core question of FDT:

“Which output of this decision procedure causes the best outcome?”

“…if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left.” But since you already know the bomb in Left, you could easily save your life by paying$100 in this specific situation, and that’s where our disagreement comes from. However, remember that if your decision theory makes you a left-boxer, you virtually never end up in the above situation! In 999,999,999,999,999,999,999,999 out of 1,000,000,000,000,000,000,000,000 situations, the predictor will have predicted you left-box, letting you keep your life for free. As Vaniver says in a comment:

“Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying ‘what morons, choosing to get on a plane that would crash!’ instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.” ## “FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it” Here wdmacaskill argues that in Newcomb’s problem, FDT recommends one-boxing if it assumes the predictor (Omega) is running a simulation of the agent’s decision process. But what if Omega isn’t running your algorithm? What if they use something else to predict your choice? To use wdmacaskill’s own example: “Perhaps the Scots tend to one-box, whereas the English tend to two-box.” Well, in that case Omega’s prediction and your decision (one-boxing or two-boxing) aren’t subjunctively dependent on the same function. And this kind of dependence is key in FDT’s decision to one-box! Without it, FDT recommends two-boxing, like CDT. In this particular version of Newcomb’s problem, your decision procedure has no influence on Omega’s prediction, and you should go for strategic dominance (two-boxing). However, wdmacaskill argues that part of the original motivation to develop FDT was to have a decision theory that one-boxes on Newcomb’s problem. I don’t care what the original motivation for FDT was with respect to this discussion. What matters is whether FDT gets Newcomb’s problem right — and it does so in both cases: when Omega does run a simulation of your decision process and when Omega does not. Alternatively, wdmacaskill argues, “Y&S could accept that the decision-maker should two-box in the cases given above. But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.” Again, I do not care where the case for one-boxing stemmed from, or what FDT’s original motivation was: I care about whether FDT gets Newcomb’s problem right. ## “Implausible discontinuities” “First, take some physical processes S (like the lesion from the Smoking Lesion) that causes a ‘mere statistical regularity’ (it’s not a Predictor). And suppose that the existence of S tends to cause both (i) one-boxing tendencies and (ii) whether there’s money in the opaque box or not when decision-makers face Newcomb problems. If it’s S alone that results in the Newcomb set-up, then FDT will recommending two-boxing.” Agreed. The contents of the opaque box and the agent’s decision to one-box or two-box don’t subjunctively depend on the same function. FDT would indeed recommend two-boxing. “But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.” No! The critical factor isn’t whether “there’s an agent making predictions”. The critical factor is subjunctive dependence between the agent and another relevant physical system (in Newcomb’s problem, that’s Omega’s prediction algorithm). Since in this last problem put forward by wdmacaskill the prediction depends on looking at S, there is no such subjunctive dependence going on and FDT would recommend two-boxing. Wdmacaskill further asks the reader to imagine a spectrum of a more and more agent-like S, and imagines that at some-point there will be a “sharp jump” where FDT goes from recommending two-boxing to recommending one-boxing. Wdmacaskill then says: “Second, consider that same physical process S, and consider a sequence of Newcomb cases, each of which gradually make S more and more complicated and agent-y, making it progressively more similar to a Predictor making predictions. At some point, on FDT, there will be a point at which there’s a sharp jump; prior to that point in the sequence, FDT would recommend that the decision-maker two-boxes; after that point, FDT would recommend that the decision-maker one-boxes. But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.” But like I explained, the “agent-ness” of a physical system is totally irrelevant for FDT. Subjunctive dependence is key, not agent-ness. The sharp jump between one-boxing and two-boxing wdmacaskill imagines there to be really isn’t there: it stems from a misunderstanding of FDT. ## “FDT is deeply indeterminate” Wdmacaskill argues that “there’s no objective fact of the matter about whether two physical processes A and B are running the same algorithm or not, and therefore no objective fact of the matter of which correlations represent implementations of the same algorithm or are ‘mere correlations’ of the form that FDT wants to ignore.” … and gives an example: “To see this, consider two calculators. The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not? Well, perhaps on this foreign calculator the ‘–’ symbol means what we usually take it to mean — namely, that the ensuing number is negative — and therefore every time we hit the ‘=’ button on the second calculator we are asking it to run the algorithm ‘compute the sum entered, then output the negative of the answer’. If so, then the calculators are systematically running different algorithms. But perhaps, in this foreign land, the ‘–’ symbol, in this context, means that the ensuing number is positive and the lack of a ‘–’ symbol means that the number is negative. If so, then the calculators are running exactly the same algorithms; their differences are merely notational.” I’ll admit I’m no expert in this area, but it seems clear to me that these calculators are running different algorithms, but that both algorithms are subjunctively dependent on the same function! Both algorithms use the same “sub-algorithm”, which calculates the correct answer to the user’s input. The second calculator just does something extra: put a negative sign in front of the answer or remove an existing one. Whether inhabitants of the foreign land interpret the ‘-’ symbol different than we do is irrelevant to the properties of the calculators. “Ultimately, in my view, all we have, in these two calculators, are just two physical processes. The further question of whether they are running the same algorithm or not depends on how we interpret the physical outputs of the calculator.” It really doesn’t. The properties of both calculators do NOT depend on how we interpret their outputs. Wdmacaskill uses this supposed dependence on interpretation to undermine FDT: in Newcomb’s problem, it would also be a matter of choice of interpretation whether Omega is running the same algorithm as you are in order to predict your choice. However, as interpretation isn’t a property of any algorithm, this becomes a non-issue. I’ll be doing a longer post on algorithm dependence/​similarity in the future. ## “But FDT gets the most utility!” Here, wdmacaskill talks about how Yudkowsky and Soares compare FDT to EDT and CDT to determine FDT’s superiority to the other two. “As we can see, the most common formulation of this criterion is that they are looking for the decision theory that, if run by an agent, will produce the most utility over their lifetime. That is, they’re asking what the best decision procedure is, rather than what the best criterion of rightness is, and are providing an indirect account of the rightness of acts, assessing acts in terms of how well they conform with the best decision procedure. But, if that’s what’s going on, there are a whole bunch of issues to dissect. First, it means that FDT is not playing the same game as CDT or EDT, which are proposed as criteria of rightness, directly assessing acts. So it’s odd to have a whole paper comparing them side-by-side as if they are rivals.” I agree the whole point of FDT is to have a decision theory that produces the most utility over the lifetime of an agent — even if that, in very specific cases like Bomb, results in “weird” (but correct!) recommendations for specific acts. Looking at it from a perspective of AI Alignment — which is the goal of MIRI, the organization Yudkowsky and Soares work for — it seems clear to me that that’s what you want out of a decision theory. CDT and EDT may have been invented to play a different game — but that’s irrelevant for the purpose of FDT. CDT and EDT — the big contenders in the field of Decision theory — fail this purpose, and FDT does better. “Second, what decision theory does best, if run by an agent, depends crucially on what the world is like. To see this, let’s go back to question that Y&S ask of what decision theory I’d want my child to have. This depends on a whole bunch of empirical facts: if she might have a gene that causes cancer, I’d hope that she adopts EDT; though if, for some reason, I knew whether or not she did have that gene and she didn’t, I’d hope that she adopts CDT. Similarly, if there were long-dead predictors who can no longer influence the way the world is today, then, if I didn’t know what was in the opaque boxes, I’d hope that she adopts EDT (or FDT); if I did know what was in the opaque boxes (and she didn’t) I’d hope that she adopts CDT. Or, if I’m in a world where FDT-ers are burned at the stake, I’d hope that she adopts anything other than FDT.” Well, no, not really — that’s the point. What decision theory does best shouldn’t depend on what the world is like. The whole idea is to have a decision theory that does well under all (fair) circumstances. Circumstances that directly punish an agent for its decision theory can be made for any decision theory and don’t refute this point. “Third, the best decision theory to run is not going to look like any of the standard decision theories. I don’t run CDT, or EDT, or FDT, and I’m very glad of it; it would be impossible for my brain to handle the calculations of any of these decision theories every moment. Instead I almost always follow a whole bunch of rough-and-ready and much more computationally tractable heuristics; and even on the rare occasions where I do try to work out the expected value of something explicitly, I don’t consider the space of all possible actions and all states of nature that I have some credence in — doing so would take years. So the main formulation of Y&S’s most important principle doesn’t support FDT. And I don’t think that the other formulations help much, either. Criteria of how well ‘a decision theory does on average and over time’, or ‘when a dilemma is issued repeatedly’ run into similar problems as the primary formulation of the criterion. Assessing by how well the decision-maker does in possible worlds that she isn’t in fact in doesn’t seem a compelling criterion (and EDT and CDT could both do well by that criterion, too, depending on which possible worlds one is allowed to pick).” Okay, so we’d need an approximation of such a decision theory — I fail to see how this undermines FDT. “Fourth, arguing that FDT does best in a class of ‘fair’ problems, without being able to define what that class is or why it’s interesting, is a pretty weak argument. And, even if we could define such a class of cases, claiming that FDT ‘appears to be superior’ to EDT and CDT in the classic cases in the literature is simply begging the question: CDT adherents claims that two-boxing is the right action (which gets you more expected utility!) in Newcomb’s problem; EDT adherents claims that smoking is the right action (which gets you more expected utility!) in the smoking lesion. The question is which of these accounts is the right way to understand ‘expected utility’; they’ll therefore all differ on which of them do better in terms of getting expected utility in these classic cases.” Yes, fairness would need to be defined exactly, although I do believe Yudkowsky and Soares have done a good job at it. And no: “claiming that FDT ‘appears to be superior’ to EDT and CDT in the classic cases in the literature” isn’t begging the question. The goal is to have a decision theory that consistently gives the most expected utility. Being a one-boxer does give you the most expected utility in Newcomb’s problem. Deciding to two-box after Omega made his prediction that you one-box (if this is possible) would give you the most utility — but you can’t have your decision theory recommending two-boxing, because that results in the opaque box being empty. In conclusion, it seems FDT survives the critique offered by wdmacaskill. I am quite new to the field of Decision theory, and will be learning more and more about this amazing field in the coming weeks. This post might be updated as I learn more. • The statement of Bomb is bad at being legible outside the FDT/​UDT paradigm, it’s instead actively misleading there, so is a terrible confusion-conflict-and-not-clarity inducing example to show someone who is not familiar with it. The reason Left is reasonable is that the scenario being described is, depending on the chosen policy, almost completely not real, a figment of predictor’s imagination. Unless you’ve read a lot of FDT/​UDT discussion, a natural reading of a thought experiment is to include the premise “the described situation is real”. And so people start talking past each other, digging into the details of how to reason about the problem, when the issue is that they read different problem statements, one where the scenario is real, and another where its reality is not at all assured. • What does it mean to say that “the described scenario is real” is not a premise of the thought experiment…? (What could the thought experiment even be about if the described scenario is not supposed to be real?) • UDT is about policies, not individual decisions. A thought experiment typically describes an individual decision taken in some situation. A policy specifies what decisions are to be taken in all situations. Some of these situations are impossible, but the policy is still defined for them following its type signature, and predictors can take a look at what exactly happens in the impossible situations. Furthermore, choice of a policy influences which situations are impossible, so there is no constant fact about which of them are impossible. The general case of making a decision in an individual situation involves uncertainty about whether this situation is impossible, and ability to influence its impossibility. This makes the requirement for thought experiments to describe real situations an unnatural contraint, so in thought experiments read in context of UDT this requirement is absent by default. A central example is Transparent Newcomb’s Problem. When you see money in the big box, this situation is either possible (if you one-box) or impossible (in you two-box), depending on your decision. If a thought experiment is described as you facing the problem in this situation (with the full box), it’s describing a situation (and observations made there) that may, depending on your decision, turn out to be impossible. Yet asking for what your decision in this situation will be is always meaningful, because it’s possible to evaluate you as an algorithm even on impossible observations, which is exactly what all the predictors in such thought experiments are doing all the time. What could the thought experiment even be about if the described scenario is not supposed to be real? It’s about evaluating you (or rather an agent) as an algorithm on the observations presented by the scenario, which is possible to do regardless of whether the scenario can be real. This in turn motivates asking what happens in other situations, not explicitly described in the thought experiment. A combined answer to all such questions is a policy. • I get the feeling, reading this, that you are using the word “impossible” in an unusual way. Is this the case? That is, is “impossible” a term of art in decision theory discussions, with a meaning different than its ordinary one? If not, then I confess that can’t make sense of much of what you say… • By “impossible” I mean not happening in actuality (which might be an ensemble, in which case I’m not counting what happens with particularly low probabilities), taking into account the policy that the agent actually follows. So the agent may have no way of knowing if something is impossible (and often won’t before actually making a decision). This actuality might take place outside the thought experiment, for example in Transparent Newcomb that directly presents you with two full boxes (that is, both boxes being full is part of the description of the thought experiment), and where you decide to take both, the thought experiment is describing an impossible situation (in case you do decide to take both boxes), while the actuality has the big box empty. So for the problem where you-as-money-maximizer choose between receiving$10 and $5, and actually have chosen$10, I would say that taking $5 is impossible, which might be an unusual sense of the word (possibly misleading before making the decision; 5-and-10 problem is about what happens if you take this impossibility too seriously in an unhelpful way). This is the perspective of an external Oracle that knows everything and doesn’t make public predictions. If this doesn’t clear up the issue, could you cite a small snippet that you can’t make sense of and characterize the difficulty? Focusing on Transparent-Newcomb-with-two-full-boxes might help (with respect to use of “impossible”, not considerations on how to solve it), it’s way cleaner than Bomb. (The general difficulty might be from the sense in which UDT is a paradigm, its preferred ways of framing its natural problems are liable to be rounded off to noise when seen differently. But I don’t know what the difficulty is on object level in any particular case, so calling this a “paradigm” is more of a hypothesis about the nature of the difficulty that’s not directly helpful.) • By “impossible” I mean not happening in actuality (which might be an ensemble, in which case I’m not counting what happens with particularly low probabilities) Sorry, do you mean that you don’t count low-probability events as impossible, or that you don’t count them as possible (a.k.a. “happening in actuality”)? So the agent may have no way of knowing if something is impossible (and often won’t before actually making a decision). This is an example of a statement that seems nonsensical to me. If I am an agent, and something is happening to me, that seems to me to be real by definition. (As Eliezer put it: “Whatever is, is real.”) And anything that is real, must (again, by definition) be possible… If what is happening to me is actually happening in a simulation… well, so what? The whole universe could be a simulation, right? How does that change anything? So the idea of “this thing that is happening to you right now is actually impossible” seems to me to be incoherent. I… have considerable difficult parsing what you’re saying in the second paragraph of your comment. (I followed the link, and a couple more from it, and was not enlightened, unfortunately.) • If I am an agent, and something is happening to me The point is that you don’t know that something is happening to you just because you are seeing it happen. Seeing it happen is what takes place when you-as-an-algorithm is evaluated on the corresponding observations. A response to seeing it happen is well-defined even if the algorithm is never actually evaluated on those observations. When we spell out what happens inside the algorithm, what we see is that the algorithm is “seeing it happen”. This is so even if we don’t actually look. (See also.) So for example, if I’m asking what would be your reaction to the sky turning green, what is the status of you-in-the-question who sees the sky turn green? They see it happen in the same way that you see it not happen. Yet from the fact that they see it happen, it doesn’t follow that it actually happens (the sky is not actually green). Another point is that for you-in-the-question, it might be the green-sky world that matters, not the blue-sky world. That is a side effect of how your insertion into the green-sky world doesn’t respect the semantics of your preferences, which care about blue-sky world. For you-in-the-question with preferences ending up changed to care about the green-sky world, the useful sense of actuality refers to the green-sky world, so that for them it’s the blue-sky world that’s impossible. But if agents share preferences, this kind of thing doesn’t happen. (This is another paragraph that doesn’t respect rabbit hole safety regulations.) If what is happening to me is actually happening in a simulation… well, so what? You typically don’t know that some observation is taking place even in a simulation, yet your response to that observation that never happens in any form, and is not predicted by any predictor, is still well-defined. It makes sense to ask what it is. Sorry, do you mean that you don’t count low-probability events as impossible, or that you don’t count them as possible (a.k.a. “happening in actuality”)? I mean that if something does happen in actuality-as-ensemble with very low probability, that doesn’t disqualify it from being impossible according to how I’m using the word. Without this caveat literally nothing would be impossible in some settings. I… have considerable difficult parsing what you’re saying in the second paragraph of your comment. The link is not helpful here, it’s more about what goes wrong when my sense of “impossible” is taken too far, for reasons that have nothing to do with word choice (it perhaps motivates this word choice a little bit). The use of that paragraph is in what’s outside the parenthetical. It’s intended to convey that when you choose between options A and B, it’s usually said that both taking A and taking B is possible, while my use of “impossible” in this thread is such that the option that’s not actually taken is instead impossible. • So for example, if I’m asking what would be your reaction to the sky turning green, what is the status of you-in-the-question who sees the sky turn green? They see it happen in the same way that you see it not happen. Yet from the fact that they see it happen, it doesn’t follow that it actually happens (the sky is not actually green). If the sky were to turn green, I would certainly behave as if it had indeed turned green; I would not say “this is impossible and isn’t happening”. So I am not sure what this gets us, as far as explaining anything… Another point is that for you-in-the-question, it might be the green-sky world that matters, not the blue-sky world. That is a side effect of how your insertion into the green-sky world doesn’t respect the semantics of your preferences, which care about blue-sky world. For you-in-the-question with preferences ending up changed to care about the green-sky world, the useful sense of actuality refers to the green-sky world, so that for them it’s the blue-sky world that’s impossible. But if agents share preferences, this kind of thing doesn’t happen. (This is another paragraph that doesn’t respect rabbit hole safety regulations.) My preferences “factor out” the world I find myself in, as far as I can tell. By “agents share preferences” are you suggesting a scenario where, if the sky were to turn green, I would immediately stop caring about anything whatsoever that happened in that world, because my preferences were somehow defined to be “about” the world where the sky were still blue? This seems pathological. I don’t think it makes any sense to say that I “care about the blue-sky world”; I care about what happens in whatever world I am actually in, and the sky changing color wouldn’t affect that. The point is that you don’t know that something is happening to you just because you are seeing it happen. Well, if something’s not actually happening, then I’m not actually seeing it happen. I don’t think your first paragraph makes sense, sorry. You typically don’t know that some observation is taking place even in a simulation, yet your response to that observation that never happens in any form, and is not predicted by any predictor, is still well-defined. It makes sense to ask what it is. Does it? I’m not sure that it does, actually… if something never happens, and I never observe it, then I never respond to it, either. My response to it is nothing. You can ask: “but if it did happen, what would be your response?”—and that’s a reasonable question. But any answer to that question would indeed have to take as given that the event in question were in fact actually happening (otherwise the question is meaningless). I mean that if something does happen in actuality-as-ensemble with very low probability, that doesn’t disqualify it from being impossible according to how I’m using the word. Without this caveat literally nothing would be impossible in some settings. Well… that is a very unusual use of “impossible”, yes. Might I suggest using a different word? You seem to be saying: “yes, certain things that can happen are impossible”, which is very much counter to all ordinary usage. I think using a word in this way can only lead to confusion… (The last paragraph of your comment doesn’t elucidate much, but perhaps that is because of the aforesaid odd word usage.) • Well, if something’s not actually happening, then I’m not actually seeing it happen. Not actually, you seeing it happen isn’t real, but this unreality of seeing it happen proceeds in a specific way. It’s not indeterminate greyness, and not arbitrary. if something never happens, and I never observe it, then I never respond to it, either. My response to it is nothing. If your response (that never happens) could be 0 or 1, it couldn’t be nothing. If it’s 0 (despite never having been observed to be 0), the claim that it’s 1 is false, and the claim that it’s nothing doesn’t type check. I’m guessing that the analogy between you and an algorithm doesn’t hold strongly in your thinking about this, it’s the use of “you” in place of “algorithm” that does a lot of work in these judgements that wouldn’t happen for talking about an “algorithm”. So let’s talk about algorithms to establish common ground. Let’s say we have a pure total procedure f written in some programming language, with the signature f : O → D, where O = Texts is the type of observations and D = {0,1} is the type of decisions. Let’s say that in all plausible histories of the world, f is never evaluated on argument “green sky”. In this case I would say that it’s impossible for the argument (observation) to be “green sky”, procedure f is never evaluated with this argument in actuality. Yet it so happens that f(“green sky”) is 0. It’s not 1 and not nothing. There could be processes sensitive to this fact that don’t specifically evaluate f on this argument. And there are facts about what happens inside f with intermediate variables or states of some abstract machine that does the evaluation (procedure f’s experience of observing the argument and formulating a response to it), as it’s evaluated on this never-encountered argument, and these facts are never observed in actuality, yet they are well-defined by specifying f and the abstract machine. You can ask: “but if it did happen, what would be your response?”—and that’s a reasonable question. But any answer to that question would indeed have to take as given that the event in question were in fact actually happening (otherwise the question is meaningless). The question of what f(“green sky”) would evaluate to isn’t meaningless regardless of whether evaluation of f on the argument “green sky” is an event that in fact actually happens. Actually extant evidence for a particular answer, such as a proof that the answer is 0, is arguably also evidence of the evaluation having taken place. But reasoning about the answer doesn’t necessarily pin it down exactly, in which case the evaluation didn’t necessarily take place. For example, perhaps we only know that f(“green sky”) is the same as g(“blue sky”), but don’t know what the values are. Actually proving this equality doesn’t in general require either f(“green sky”) or g(“blue sky”) to be actually evaluated. You seem to be saying: “yes, certain things that can happen are impossible”, which is very much counter to all ordinary usage. Winning a billion dollars on the stock market by following the guidance of a random number generator technically “can happen”, but I feel it’s a central example of something impossible in ordinary usage of the word. I also wouldn’t say that it can happen, without the scare quotes, even though technically it can. I would not say “this is impossible and isn’t happening”. This is mostly relevant for decisions between influencing one world and influencing another, possible when there are predictors looking from one world into the other. I don’t think behavior within-world (in ordinary situations) should significantly change depending on its share of reality, but also I don’t see a problem with noticing that the share of reality of some worlds is much smaller than for some other worlds. Another use is manipulating a predictor that imagines you seeing things that you (but not the predictor) know can’t happen, and won’t notice you noticing. • Well, if something’s not actually happening, then I’m not actually seeing it happen. Not actually, you seeing it happen isn’t real, but this unreality of seeing it happen proceeds in a specific way. It’s not indeterminate greyness, and not arbitrary. What do you mean, “proceeds in a specific way”? It doesn’t proceed at all. Because it’s not happening, and isn’t real. if something never happens, and I never observe it, then I never respond to it, either. My response to it is nothing. If your response (that never happens) could be 0 or 1, it couldn’t be nothing. If it’s 0 (despite never having been observed to be 0), the claim that it’s 1 is false, and the claim that it’s nothing doesn’t type check. This seems wrong to me. If my response never happens, then it’s nothing; it’s the claim that it’s 1 that doesn’t type check, as does the claim that it’s 0. It can’t be either 1 or 0, because it doesn’t happen. (In algorithm terms, if you like: what is the return value of a function that is never called? Nothing, because it’s never called and thus never returns anything. Will that function return 0? No. Will it return 1? Also no.) Let’s say we have a pure total procedure f written in some programming language, with the signature f : O → D, where O = Texts is the type of observations and D = {0,1} is the type of decisions. Let’s say that in all plausible histories of the world, f is never evaluated on argument “green sky”. In this case I would say that it’s impossible for the argument (observation) to be “green sky”, procedure f is never evaluated with this argument in actuality. (Reference for readers who may not be familiar with the relevant terminology, as I was not: Pure Functions and Total Functions.) There could be processes sensitive to this fact that don’t specifically evaluate f on this argument. Please elaborate! The question of what f(“green sky”) would evaluate to isn’t meaningless regardless of whether evaluation of f on the argument “green sky” is an event that in fact actually happens. Indeed, but the question of what f(“green sky”) actually returns, certainly is meaningless if f(“green sky”) is never evaluated. Actually extant evidence for a particular answer, such as a proof that the answer is 0, is arguably also evidence of the evaluation having taken place. But reasoning about the answer doesn’t necessarily pin it down exactly, in which case the evaluation didn’t necessarily take place. For example, perhaps we only know that f(“green sky”) is the same as g(“blue sky”), but don’t know what the values are. Actually proving this equality doesn’t in general require either f(“green sky”) or g(“blue sky”) to be actually evaluated. I’m afraid I don’t see what this has to do with anything… Winning a billion dollars on the stock market by following the guidance of a random number generator technically “can happen”, but I feel it’s a central example of something impossible in ordinary usage of the word. I also wouldn’t say that it can happen, without the scare quotes, even though technically it can. I strongly disagree that this matches ordinary usage! … predictors looking from one world into the other … I am not sure what you mean by this? (Or by the rest of your last paragraph, for that matter…) • By “impossible” I mean not happening in actuality Thats pretty non-standard. Sorry, do you mean that you don’t count low-probability events as impossible, or that you don’t count them as possible (a.k.a. “happening in actuality”)? I think you need to answer that. • Here’s an attempt to ground this somewhat concretely. Suppose there’s an iterated prisoner’s dilemma contest. At any iteration an agent can look at the history of plays that itself and its opponent have made. Suppose that TitForTatBot looks at the history, and sees that there’s been 100 rounds so far, and in every one it has defected and its opponent has cooperated. It proceeds to cooperate, because its opponent cooperated in the previous round. And so the “actual” game history will never be (D,C) x 100. What’s happened here is that someone has instantiated a TitForTatBot and lied to it. It’s not impossible that TitForTatBot will observe this history, but it’s impossible that this history actually happened, in some sense that I claim we care about. • Hmm, no, I still don’t think this works. In the scenario you describe, it seems to me that TitForTatBot neither observed the specified history, nor did it actually happen—but it does observe finding itself in a scenario where that history (apparently) happened, and it does indeed actually find itself in a scenario where that history (apparently) happened. Now, I think that your example does bring up an interesting and relevant point, namely: when should an agent question whether some of the things it seems to know or observe are actually false or illusionary? Surely the answer is not “never”, else the agent will be easy to fool, and will make some very foolish decisions! So perhaps TitForTatBot (if we suppose that it’s not just a “bot” but also has some higher reasoning functions) might think: “Hmm, I defected 100 times? Sounds made-up, I think somebody’s been tampering with my memory! The proverbial evil neurosurgeons strike again!” But consider how this might work in the “Bomb” case. Should I find myself in the “Bomb” scenario, I might think: “A predictor that’s only been wrong one out of a trillion trillion times? And it’s just been wrong again? And there’s a bomb in this here Left box, and me an FDT agent, no less! Something doesn’t add up… perhaps one or more of the things I think I know, aren’t so!” And this seems like a reasonable enough thought. But surely it would then be far more reasonable to question the whole “one-in-a-trillion-trillion-accurate predictor” business, than to say “This bomb I see in front of me is fake, and the box is also fake! This whole scenario is fake!” Right? I mean… how do I know this stuff about the predictor, and its accuracy? It’s a pretty outlandish claim, isn’t it—one mistake out of a trillion trillion? How sure am I that I’m privy to all the information about the predictor’s past performance? And really, the whole situation is weird: I’m the last person in existence, apparently? And so on… but the reality of me being alive, not wanting to die, and staring at an actual bomb right in front of me—well, if I trust anything, I’ll trust the evidence of my senses before I trust in some stuff I’ve been told about a long-dead predictor, or what have you. Anyway, this seems to me to be the kind of skepticism that makes sense in a situation like this. And none of it seems to lead to the sort of analysis described by the FDT proponents… • Unless you’ve read a lot of FDT/​UDT discussion, a natural reading of a thought experiment is to include the premise “the described situation is real”. Which is why I encouraged the reader unfamiliar with FDT to read the Yudkowsky & Soares paper first. • I read through this long enough to come to the conclusion that the author of the original article simply does not understand FDT rather than having valid criticisms of it, and stopped there, that being perfectly sufficient to refute the article. • by running a simulation of you and seeing what that simulation did. A simulation of your choice “upon seeing a bomb in the Left box under this scenario”? In that case, the choice to always take the Right box “upon seeing a bomb in the Left box under this scenario” is correct, and what any of the decision theories would recommend. Being in such a situation does necessitate the failure of the predictor, which means you are in a very improbable world, but that is not relevant to your decision in the world you happen to be in (simulated or not). Or: A simulation of your choice in some different scenario (e.g. not seeing the contents of the boxes)? In that simulation, you would choose some box, but regardless of what that decision would happen to be, you are free to pick the Right box in this scenario, because it is a different scenario. Perhaps you picked Left in the alternative scenario, perhaps the predictor failed; neither is relevant here. Why would any decision theory ever choose “Left” in this scenario? • A simulation of your choice “upon seeing a bomb in the Left box under this scenario”? In that case, the choice to always take the Right box “upon seeing a bomb in the Left box under this scenario” is correct, and what any of the decision theories would recommend. Good point. It seems to me Left-boxing is still the right answer though, since your decision procedure would still ‘force’ the predictor to predict you Left-box. • What does it mean to Left-box, exactly? As in, under what specific scenarios are you making a choice between boxes, and choosing the Left box? • Off-topic: I initially misread this title as “A defense of density functional theory,” and was intrigued. • There are two huge ambiguities in this scenario: 1. did Predictor include the note in the simulation, or write it later? If there was even a small (say, anything more than 1 in a million) chance that it was written later, the agent should pick Right. 2. does Predictor always add a note showing the prediction in this scenario? We can rule out the combination of both together. It is not possible for Predictor to always write a note that honestly records their prediction including the note and still guarantee 10^-24 chance of prediction error. If the note has nontrivial chance of being a lie, then the agent should always pick Right. So to examine the scenario under FDT, we can assume that the note is genuine, in the case where there is any note at all. So there are at least eight decision functions, one for every combination inputs of “note says Left”, “note says Right”, or “no note”. We have no information about the circumstances in which the predictor leaves a note or not, and this matters! If the predictor is adversarial to the extent that they are able to be within the bounds of their 10^-24 prediction error, then FDT does indeed say that the agent should pick Left whenever the predictor says “I predicted Right”. A decision function that picks right upon seeing a note saying “I predicted right” would mean that the predictor can force the agent to almost always pick right and pay$100. The predictor can’t force the agent to burn to death by leaving truthful notes with probability more than 10^-24, so this consideration dominates.

However if the predictor is helpful, then FDT says that the agent should only pick Left when they see a note saying that the prediction was Left. This means that the agent never burns to death, and has only about 10^-24 chance of paying $100. Every other decision function means a comparable chance of burning to death, which is very much worse. Edit: The previous is all meaningless, because I misread the statement on Predictor’s accuracy. FDT does not endorse taking the left box in this scenario as stated. • Can you explain point 1 further, please? It seems to me subjunctive dependence happens regardless of note inclusion, and thus one’s decision theory should left-box in both cases. (I’ll respond to your other points as well.) • If the note was not included in the simulation, then under FDT there is no subjunctive dependence: the output produced by the simulator is for different input than the ones you actually experienced. In the usual FDT analogy, the fact that both you and Predictor are almost certainly using the same type of calculator means nothing if you’re pressing different buttons. We’re told about Predictor’s simulation fidelity, but that doesn’t mean anything if the inputs to the simulation are not the same as reality. You can work through FDT with the assumption that that a note is with probability p written after simulating you (with fidelity 1 − 10^-24) without a note, and it says that for all but microscopic p you should choose Right. This is a boring scenario and doesn’t illustrate any differences between decision theories, so I didn’t bother to expand on it. Edit: The previous is all pointless due to misreading the statement about Predictor’s accuracy. FDT recommends taking the Right box in this scenario regardless of whether points 1 and 2 hold. • I should note that my previous comment is all theoretical wankery. In practice, there is no way that I’ll accept any evidence that a predictor has 10^-24 chance of being wrong. I’m going to take the right box. I won’t even trust that the right box won’t blow up, since the scenario I’ve been kidnapped into has obviously been devised by a sadistic bastard, and I wouldn’t put it past them to put bombs in both boxes (or under the floor) no matter what the alleged predictor supposedly thinks. Just maybe there’s a slightly better chance of surviving by paying the$100.

• Re: the Bomb scenario:

It seems to me that the given defense of FDT is, to put it mildly, unsatisfactory. Whatever “fancy” reasoning is proffered, nevertheless the options on offer are “burn to death” or “pay $100”—and the choice is obvious. FDT recommends knowingly choosing to burn to death? So much the worse for FDT! FDT has very persuasive reasoning for why I should choose to burn to death? Uh-huh (asks the non-FDT agent), and if you’re so rational, why are you dead? Counterfactuals, you say? Well, that’s great, but you still chose to burn to death, instead of choosing not to burn to death. In “Newcomb’s Problem and Regret of Rationality”, Eliezer wrote: Unreasonable? I am a rationalist: what do I care about being unreasonable? I don’t have to conform to a particular ritual of cognition. I don’t have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just… take only box B. Similarly, you don’t have to take the Right box because your decision theory says you should. You can just… take the Right box. And, you know… not burn to death. (Maybe the real FDT is “use FDT in all the cases except where doing so will result in you burning to death, in which case use not-FDT”? That way you get the good outcome in all 1 trillion trillion cases, eh?) P.S. Vaniver’s comment seems completely inapplicable to me, since in the “Bomb” scenario it’s not a question of uncertainty at all. • The question is not which action to take. The question is which decision theory gives the most utility. Any candidate for “best decision theory” should take the left box. This results in a virtually guaranteed save of$100 - and yes, a death burn in an extremely unlikely scenario. In that unlikely scenario, yes, taking the right box gives the most utility—but that’s answering the wrong question.

• This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)

But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!

Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.

So when selecting a decision theory, you may of course feel free to pick the one that says that you must pick Left, and knowingly burn to death, while I will pick the one that says that I can pick whatever I want. One of us will be dead, and the other will be “smiling from atop a heap of utility”.

(“But what about all those other possible worlds?”, you may ask. Well, by construction, I don’t find myself in any of those, so they’re irrelevant to my decision now, in the actual world.)

• Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.

Well, I’d say FDT recognizes that you do choose in advance, because you are predictable. Apparently you have an algorithm running that makes these choices, and the predictor simulates that algorithm. It’s not that you “must” stick to your choice. It’s about constructing a theory that consistently recommends the actions that maximize expected utility.

I know I keep repeating that—but it seems that’s where our disagreement lies. You look at which action is best in a specific scenario, I look at what decision theory produces the most utility. An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.

• An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.

That seems like an argument against “running a decision theory”, then!

Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…

Clearly, I, a human agent placed in the described scenario, could choose either Left or Right. Well, then we should design our AGI in such a way that it also has this same capability.

Obviously, the AGI will in fact (definitionally) be running some algorithm. But whatever algorithm that is, ought to be one that results in it being able to choose (and in fact choosing) Right in the “Bomb” scenario.

What decision theory does that correspond to? You tell me…

• CDT

• That seems like an argument against “running a decision theory”, then!

Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…

Exactly, it doesn’t make sense. It is in fact nonsense, unless you are saying it’s impossible to specify a coherent, utility-maximizing decision theory at all?

Btw, please explain how it’s consistent with what I wrote, because it seems obvious to me it’s not.

• And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.

But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!

Yes, but the point is to construct a decision theory that recommends actions in a way that maximizes expected utility. Recommending left-boxing does that, because it saves you $100 in virtually every world. That’s it, really. You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT. Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need. • And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds. So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead. Who knows what I would do in any of those worlds, and what would happen as a result? Who knows what you would do? In the given scenario, FDT loses, period, and loses really badly and, what is worse, loses in a completely avoidable manner. You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT. As I said, this reasoning makes sense if, at the time of your decision, you don’t know what possibility you will end up with (and are thus making a gamble). It makes no sense at all if you are deciding while in full possession of all relevant facts. Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need. Totally, and the decision theory we need is one that doesn’t make such terrible missteps! Of course, it is possible to make an argument like: “yes, FDT fails badly in this improbable scenario, but all other available decision theories fail worse /​ more often, so the best thing to do is to go with FDT”. But that’s not the argument being made here—indeed, you’ve explicitly disclaimed it… • So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead. No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction. There are multiple paths, each with its own probability. The problem description focuses on that one world, yes. But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture. Totally, and the decision theory we need is one that doesn’t make such terrible missteps! Do you agree that recommending left-boxing before the predictor makes its prediction is rational? • No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction. Well, no. We can reason about more worlds. But we can’t actually inspect them. Here’s the question I have, though, which I have yet to see a good answer to. You say: But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture. But why can’t our decision theory recommend “choose Left if and only if it contains no bomb; otherwise choose Right”? (Remember, the boxes are open; we can see what’s in there…) Do you agree that recommending left-boxing before the predictor makes its prediction is rational? I think that recommending no-bomb-boxing is rational. Or, like: “Take the left box, unless of course the predictor made a mistake and put a bomb in there, in which case, of course, take the right box.” • As to inspection, maybe I’m not familiar enough with the terminology there. Re your last point: I was just thinking about that too. And strangely enough I missed that the boxes are open. But wouldn’t the note be useless in that case? I will think about this more, but it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.”, and FDT doesn’t do this. The problem is, in that case the prediction influences what you end up doing. What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose$100 easily. Maybe if you believed the predictor to be benevolent?

• And strangely enough I missed that the boxes are open.

Well, uh… that is rather an important aspect of the scenario…

… it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.” …

Why not?

The problem is, in that case the prediction influences what you end up doing.

Yes, it certainly does. And that’s a problem for the predictor, perhaps, but why should it be a problem for me? People condition their actions on knowledge of past events (including predictions of their actions!) all the time.

What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily. Indeed, the predictor doesn’t have to predict anything to make me lose$100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem… • Well, uh… that is rather an important aspect of the scenario… Sure. But given the note, I had the knowledge needed already, it seems. But whatever. Indeed, the predictor doesn’t have to predict anything to make me lose$100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem… Didn’t say it was a tricky decision problem. My point was that your strategy is easily exploitable and may therefore not be a good strategy. • If your strategy is “always choose Left”, then a malevolent “predictor” can put a bomb in Left and be guaranteed to kill you. That seems much worse than being mugged for$100.

• The problem description explicitly states the predictor doesn’t do that, so no.

• I don’t see how that’s relevant. In the original problem, you’ve been placed in this weird situation against your will, where something bad will happen to you (either the loss of $100 or … death). If we’re supposing that the predictor is malevolent, she could certainly do all sorts of things… are we assuming that the predictor is constrained in some way? Clearly, she can make mistakes, so that opens up her options to any kind of thing you like. In any case, your choice (by construction) is as stated: pay$100, or die.

• You don’t see how the problem description preventing it is relevant?

The description doesn’t prevent malevolence, but it does prevent putting a bomb in left if the agent left-boxes.

• This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)

But that’s not the case here.

It is the case, in way. Otherwise the predictor could not have predicted your action. I’m not saying you actively decide what to do beforehand, but apparently you are running a predictable decision procedure.

• I’m gonna try this one more time from a different angle: what’s your answer on Parfit’s Hitchhiker? To pay or not to pay?

• Pay.

• So even though you are already in the city, you choose to pay and lose utility in that specific scenario? That seems inconsistent with right-boxing on Bomb.

For the record, my answer is also to pay, I but then again I also left-box on Bomb.

• Parfit’s Hitchhiker is not an analogous situation, since it doesn’t take place in a context like “you’re the last person in the universe and will never interact with another agent ever”, nor does paying cause me to burn to death (in which case I wouldn’t pay; note that this would defeat the point of being rescued in the first place!).

But more importantly, in the Parfit’s Hitchhiker situation, you have in fact been provided with value (namely, your life!). Then you’re asked to pay a (vastly smaller!) price for that value.

In the Bomb scenario, on the other hand, you’re asked to give up your life (very painfully), and in exchange you get (and have gotten) absolutely nothing whatsoever.

So I really don’t see the relevance of the question…

• Actually, I have thought about this a bit more and concluded Bomb and Parfit’s hitchhiker are indeed analogous in a very important sense: both problems give you the option to “pay” (be it in dollars or with torture and death), even though not paying doesn’t causally affect whether or not you die.

In the Bomb scenario, on the other hand, you’re asked to give up your life (very painfully), and in exchange you get (and have gotten) absolutely nothing whatsoever.

Like Partfit’s hitchhiker, where you are asked to pay $1000 even though you are already rescued. • since it doesn’t take place in a context like “you’re the last person in the universe and will never interact with another agent ever” That was never relevant to begin with. Parfit’s Hitchhiker is not an analogous situation Well, both problems have a predictor and focus on a specific situation after the predictor has already made the prediction. Both problems have subjunctive dependence. So they are analogous, but they have differences as well. However, it seems like you don’t pay because of subjunctive dependence reasons, so never mind, I guess. • FDT has very persuasive reasoning for why I should choose to burn to death? Uh-huh (asks the non-FDT agent), and if you’re so rational, why are you dead? I think the more fundamental issue is that you can construct these sorts of dilemmas for all decision theories. For example, you can easily come up with scenarios where Omega punishes you for following a certain decision theory and rewards you otherwise. The right question to ask is not whether a decision theory recommends something that makes you burn to death in some scenario, but whether it recommends you do so across a broad class of fair dilemmas. I’m not convinced that FDT does that, and the bomb dilemma did not move me much. • You can of course construct scenarios where Omega punishes you for all sorts of things, but in the given case, FDT recommends a manifestly self-destructive action, in a circumstance where you’re entirely free to instead not take that action. Other decision theories do not do this (whatever their other faults may be). The right question to ask is not whether a decision theory recommends something that makes you burn to death in some scenario But of course it is the right question. The given dilemma is perfectly fair. FDT recommends that you knowingly choose to burn to death, when you could instead not choose to burn to death, and incur no bad consequences thereby. This is a clear failure. • The given dilemma is perfectly fair. What makes the bomb dilemma seem unfair to me is the fact that it’s conditioning on an extremely unlikely event. The only way we blow up is if the predictor predicted incorrectly. But by assumption, the predictor is near-perfect. So it seems implausible that this outcome would ever happen. • What makes the bomb dilemma seem unfair to me is the fact that it’s conditioning on an extremely unlikely event. Why is this unfair? Look, I keep saying this, but it doesn’t seem to me like anyone’s really engaged with it, so I’ll try again: If the scenario were “pick Left or Right; after you pick, then the boxes are opened and the contents revealed; due to [insert relevant causal mechanisms involving a predictor or whatever else here], the Left box should be empty; unfortunately, one time in a trillion trillion, there’ll be some chance mistake, and Left will turn out (after you’ve chosen it) to have a bomb, and you’ll blow up”… … then FDT telling you to take Left would be perfectly reasonable. I mean, it’s a gamble, right? A gamble with an unambiguously positive expected outcome; a gamble you’ll end up winning in the utterly overwhelming majority of cases. Once in a trillion trillion times, you suffer a painful death—but hey, that’s better odds than each of us take every day when we cross the street on our way to the corner store. In that case, it would surely be unfair to say “hey, but in this extremely unlikely outcome, you end up burning to death!”. But that’s not the scenario! In the given scenario, we already know what the boxes have in them. They’re open; the contents are visible. We already know that Left has a bomb. We know, to a certainty, that choosing Left means we burn to death. It’s not a gamble with an overwhelming, astronomical likelihood of a good outcome, and only a microscopically tiny chance of painful death—instead, it’s knowingly choosing a certain death! Yes, the predictor is near-perfect. But so what? In the given scenario, that’s no longer relevant! The predictor has already predicted, and its prediction has already been evaluated, and has already been observed to have erred! There’s no longer any reason at all to choose Left, and every reason not to choose Left. And yet FDT still tells us to choose Left. This is a catastrophic failure; and what’s more, it’s an obvious failure, and a totally preventable one. Now, again: it would be reasonable to say: “Fine, yes, FDT fails horribly in this very, very rare circumstance; this is clearly a terrible mistake. Yet other decision theories fail, at least this badly, or in far more common situations, or both, so FDT still comes out ahead, on net.” But that’s not the claim in the OP; the claim is that, somehow, knowingly choosing a guaranteed painful death (when it would be trivial to avoid it) is the correct choice, in this scenario. And that’s just crazy. • My updated defense of FDT, should you be interested. • But that’s not the claim in the OP; the claim is that, somehow, knowingly choosing a guaranteed painful death (when it would be trivial to avoid it) is the correct choice, in this scenario. And that’s just crazy. Like I’ve said before, it’s not about which action to take, it’s about which strategy to have. It’s obvious right-boxing gives the most utility in this specific scenario only, but that’s not what it’s about. • Why? Why is it not about which action to take? It’s obvious right-boxing gives the most utility in this specific scenario only, but that’s not what it’s about. I reject this. If Right-boxing gives the most utility in this specific scenario, then you should Right-box in this specific scenario. Because that’s the scenario that—by construction—is actually happening to you. In other scenarios, perhaps you should do other things. But in this scenario, Right is the right answer. • I reject this. If Right-boxing gives the most utility in this specific scenario, then you should Right-box in this specific scenario. Because that’s the scenario that—by construction—is actually happening to you. In other scenarios, perhaps you should do other things. But in this scenario, Right is the right answer. And this is the key point. It seems to me impossible to have a decision theory that right-boxes in Bomb but still does as well as FDT does in all other scenarios. • Why? Why is it not about which action to take? It’s about which strategy you should adhere to. The strategy of right-boxing loses you$100 virtually all the time.

• If it’s about utility, then specify it in terms of utility, not death or dollars.

• Utility is often measured in dollars. If I had created the Bomb scenario, I would have specified life/​death in terms of dollars as well. Like, “Life is worth $1,000,000 to you.” That way, you can easily compare the loss of your life to the$100 cost of Right-boxing.

• Look, I keep saying this, but it doesn’t seem to me like anyone’s really engaged with it, so I’ll try again:

If the scenario were “pick Left or Right; after you pick, then the boxes are opened and the contents revealed; due to [insert relevant causal mechanisms involving a predictor or whatever else here], the Left box should be empty; unfortunately, one time in a trillion trillion, there’ll be some chance mistake, and Left will turn out (after you’ve chosen it) to have a bomb, and you’ll blow up”…

… then FDT telling you to take Left would be perfectly reasonable. I mean, it’s a gamble, right? A gamble with an unambiguously positive expected outcome; a gamble you’ll end up winning in the utterly overwhelming majority of cases. Once in a trillion trillion times, you suffer a painful death—but hey, that’s better odds than each of us take every day when we cross the street on our way to the corner store. In that case, it would surely be unfair to say “hey, but in this extremely unlikely outcome, you end up burning to death!”.

But that’s not the scenario!

Yes, you keep saying this, and I still think you’re wrong. Our candidate decision theory has to recommend something for this scenario—and that recommendation gets picked up by the predictor beforehand. You have to take that into account. You seem to be extremely focused on this extremely unlikely scenario, which is odd to me.

And yet FDT still tells us to choose Left. This is a catastrophic failure; and what’s more, it’s an obvious failure, and a totally preventable one.

How exactly is it preventable? I’m honestly asking. If you have a strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT, I’m all ears.

• How exactly is it preventable? I’m honestly asking.

It’s preventable by taking the Right box. If you take Left, you burn to death. If you take Right, you don’t burn to death.

If you have a strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT, I’m all ears.

Totally, here it is:

FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead.

• FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead.

How would this work? Your strategy seems to be “Left-box unless the note says there’s a bomb in Left”. This ensures the predictor is right whether she puts a bomb in Left or not, and doesn’t optimize expected utility.

• It doesn’t kill you in a case when you can choose not to be killed, though, and that’s the important thing.

• It costs you p * $100 for 0 ⇐ p ⇐ 1 where p depends on how “mean” you believe the predictor is. Left-boxing costs 10^-24 *$1,000,000 = $10^-18 if you value life at a million dollars. Then if p > 10^-20, Left-boxing beats your strategy. • Why would I value my life finitely in this case? (Well, ever, really, but especially in this scenario…) • Also, were you operating under the life-has-infinite-value assumption all along? If so, then 1. You were incorrect about FDT’s decision in this specific problem 2. You should probably have mentioned you had this unusual assumption, so we could have resolved this discussion way earlier 1. Note that FDT Right-boxes when you give life infinite value. 2. What’s special in this scenario with regards to valuing life finitely? 3. If you always value life infinitely, it seems to me all actions you can ever take get infinite values, as there is always a chance you die, which makes decision making on basis of utility pointless. • Totally, here it is: FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead. Unfortunately, that doesn’t work. The predictor, if malevolent, could then easily make you choose right and pay a$100.

Left-boxing is the best strategy possible as far as I can tell. As in, yes, that extremely unlikely scenario where you burn to death sucks big time, but there is no better strategy possible (unless there is a superior strategy I—and it appears everybody—haven’t/​hasn’t thought of).

• If you commit to taking Left, then the predictor, if malevolent, can “mistakenly” “predict” that you’ll take Right, making you burn to death. Just like in the given scenario: “Whoops, a mistaken prediction! How unfortunate and improbable! Guess you have no choice but to kill yourself now, how sad…”

There absolutely is a better strategy: don’t knowingly choose to burn to death.

• If you commit to taking Left, then the predictor, if malevolent, can “mistakenly” “predict” that you’ll take Right, making you burn to death.

We know the error rate of the predictor, so this point is moot.

There absolutely is a better strategy: don’t knowingly choose to burn to death.

I still have to see a strategy incorporating this that doesn’t overall lose by losing utility in other scenarios.

• We know the error rate of the predictor, so this point is moot.

How do we know it? If the predictor is malevolent, then it can “err” as much as it wants.

• For the record, I read Nate’s comments again, and I now think of it like this:

To the extent that the predictor was accurate in her line of reasoning, then you left-boxing does NOT result in you slowly burning to death. It results in, well, the problem statement being wrong, because the following can’t all be true:

1. The predictor is accurate

2. The predictor predicts you right-box, and places the bomb in left

3. You left-box

And yes, apparently the predictor can be wrong, but I’d say, who even cares? The probability of the predictor being wrong is supposed to be virtually zero anyway (although as Nate notes, the problem description isn’t complete in that regard).

• We know it because it is given in the problem description, which you violate if the predictor ‘can “err” as much as it wants’.

• Well, it’s only unlikely if the agent left-boxes. If she right-boxes, the scenario is very likely.

I don’t think the problem itself is unfair—what’s unfair is saying FDT is wrong for left-boxing.

• The given dilemma is perfectly fair.

For the record: I completely agree with Said on this specific point. Bomb is a fair problem. Each decision theory entering this problem gets dealt the exact same hand.

FDT recommends that you knowingly choose to burn to death, when you could instead not choose to burn to death, and incur no bad consequences thereby. This is a clear failure.

No. Ironically, Bomb is an argument for FDT, not against it: for if I adhere to FDT, I will never* burn to death AND save myself $100 if I do face this predictor. *never here means only 1 in 1 trillion trillion if you meet the predictor • If there is some nontrivial chance that the predictor is adversarial but constrained to be accurate and truthful (within the bounds given), then on the balance of probability people taking the right box upon seeing a note predicting right are worse off. Yes, it sucks that you in particular got screwed, but the chances of that were astronomically low. This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds. Edit: The odds were not astronomically low. I misinterpreted the statement about Predictor’s accuracy to be stronger than it actually was. FDT recommends taking the right box, and paying$100.

• on the balance of probability people taking the right box upon seeing a note predicting right are worse off

No, because the scenario stipulates that you find yourself facing a Left box with a bomb. Anyone who finds themselves in this scenario is worse off taking Left than Right, because taking Left kills you painfully, and taking Right does no such thing. There is no question of any “balance of probability”.

Yes, it sucks that you in particular got screwed, but the chances of that were astronomically low.

But you didn’t “get screwed”! You have a choice! You can take Left, or Right.

Again: the scenario stipulates that taking Left kills you, and FDT agrees that taking Left kills you; and likewise it is stipulated (and FDT does not dispute) that you can indeed take whichever box you like.

This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.

All of that is completely irrelevant, because in the actual world that you (the agent in the scenario) find yourself in, you can either burn to death, or not. It’s completely up to you. You don’t have to do what FDT says to do, regardless of what happens in any other possible worlds or counterfactuals or what have you.

It really seems to me like anyone who takes Left in the “Bomb” scenario is making almost exactly the same mistake as people who two-box in the classic Newcomb’s problem. Most of the point of “Newcomb’s Problem and Regret of Rationality” is that you don’t have to, and shouldn’t, do things like this.

But actually, it’s a much worse mistake! In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/​logical/​functional/​whatever decision theories accept it. But here, there is no disagreement at all; FDT admits that choosing Left causes you to die painfully, but says you should do it anyway! That is obviously much worse.

The other point of “Newcomb’s Problem and Regret of Rationality” is that it is a huge mistake to redefine losing (such as, say, burning to death) as winning. That, also, seems like a mistake that’s being made here.

I don’t see that there’s any way of rescuing this result.

• According to me, the correct rejoinder to Will is: I have confidently asserted that X is false for X whose probabliity I assign much greater probability than 1 in a trillion trillion, and so I hereby confidently assert that no, I do not see the bomb on the left. You see the bomb on the left, and lose $100. I see no bombs, and lose$0.

I can already hear the peanut gallery objecting that we can increase the fallibility of the predictor to reasonable numbers and I’d still take the bomb, so before we go further, let’s all agree that sometimes you’re faced with uncertainty, and the move that is best given your uncertainty is not the same as the move that is best given perfect knowledge. For example, suppose there are three games (“lowball”, “highball”, and “extremeball”) that work as follows. In each game, I have three actions—low, middle, and high. In the lowball game, my payouts are $5,$4, and $0 respectively. In the highball game, my payouts are$0, $4, and$5 respectively. In the extremeball game, my payouts are $5,$4, and $5 respectively. Now suppose that the real game I’m facing is that one of these games is chosen at uniform random by unobserved die roll. What action should I choose? Clearly ‘middle’, with an expected utility of$4 (compared to $3.33 for either ‘low’ or ‘high’). And when I do choose middle, I hope we can all agree that it’s foul play to say “you fool, you should have chosen low because the game is lowball”, or “you fool, there is no possible world in which that’s the best action”, or “you idiot, that’s literally the worst available action because the game was exrtemeball”. If I knew which game I was playing, I’d play the best move for that game. But insofar as I must enter a single action played against the whole mixture of games, I might have to choose something that’s not the best action in your favorite subgame. With that in mind, we can now decompose Will’s problem with the bomb into two subgames that I’m bound to play simultaneously. In one subgame (that happens with probabliity 2 in a trillion trillion, although feel free to assume it’s more likely than that), the predictor is stumped and guesses randomly. We all agree that in that subgame, the best action is to avoid the bomb. In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending$100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.

That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”. Similar to how if you say “assume you’re going to take the $5 bill, and you can either take the$5 bill or the $10 bill, but if you violate the laws of logic then you get a$100 fine, what do you do?” I can validly say “no”. It’s not my fault that you named a decision problem whose premises I can flatly refute.

Hopefully we all agree that insofar as the predictor is perfect (which, remember, is a case in the case analysis when the predictor is falible), the problem statement here is deeply flawed, because I can by an action of mine refute it outright. The standard rejoinder is a bit of sleight-of-hand, where the person posing the problem says “ah, but the predictor is fallible”. But as we’ve already seen, I can just decompose it right back into two subproblems that we then aggregate across (much like the higball/​lowball/​extremeball case), at which point one of our case-analyses reveals that insofar as the predictor is accurate, the whole problem-statement is still flawed.

And this isn’t me saying “I wish to be evaluated from an epistemic vantage point that takes into account the other imaginary branches of reality”. This is me saying, your problem statement was wrong. It’s me pointing out that you’re a liar, or at least that I can by a clever choice of actions render you a liar. When you say “the predictor was accurate and you saw the bomb, what do you do?”, and I say “take the bomb”, I don’t get blown up, I reveal your mistake. Your problem statement is indeterminate. You shouldn’ta given me a problem I could refute. I’m not saying “there’s other hypothetical branches of reality that benefit from me taking this bomb”, I’m saying “WRONG, tell me what really happened”. Your story was false, my dude.

There’s some question of what to do when an obviously ill-formed game is mixed in with a properly-formed game, by, eg, adding some uncertainty about whether the predictor is fallible. Like, how are we supposed to analyze games comprising subgames where the problem statement can be refuted in one subgame but not others? And according to me, the obvious answer is that if you say “you are 1% playing problem A and 99% playing problem B”, and if I can by some act refute that I’m playing problem B, then I am perfectly licensed in saying “WRONG (99%)”. Mixing in a little uncertainty (or even a lot of uncertainty!) doesn’t stop you from being wrong (at my will) in the cases where you’re asserting falsehoods about my actions.

So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times). If instead (as is the oral tradition when someone leaves their decision problem indeterminate) counterfactually-spiting the predictor causes me to find myself in a room full of hornets rather than exits, then what really happened is that I saw no bomb (and no hornets), almost certainly. If you want me to stop denying your problem-statements outright, you’ve gotta stop giving me problem statements that I can (probabilistically) refute by my actions. • Thanks, this comment thread was pretty helpful. After reading your comments, here’s my current explanation of what’s up with the bomb argument: The problem with claiming that taking Left is wrong in the bomb-problem, is in the inference that “because I see the bomb is in the left box” this necessarily implies that “I am in the one-in-a-trillion-trillion situation where the predictor was inaccurate”. However, this is forgetting the other option, where this setup is one of the vastly larger other worlds where I am being predicted by the predictor. In those worlds, it’s important that I make the worlds logically inconsistent by taking the left box, and so the predictor cannot accurately predict me taking the right box. Otherwise it may notice that this is a fixed point (i.e. leaving a note saying he predicted I’d take the right box does in fact lead to me taking the right box) and then call it a day and execute an action (that I don’t want) with an accurate prediction. So there’s a 1-in-a-trillion-trillion chance that I am in the real game, and a notably higher chance that I’m in the mind of some agent predicting what I’ll do here. (And in that simulation-world, I don’t want to do the thing I don’t want them to predict that I’ll do.) So make your estimates of the probably you’re being perfectly simulated and the probability you’re in the real game, and then compare them, multiply by utilities, and you’re done. Then I’m a bit confused about how to estimate that probability, but I suspect the reasoning goes like this: It’s near-certain that I will be simulated in some scenarios by the predictor, and it’s one-in-a-trillion-trillions that this is the real scenario. This scenario seems “plausible” as a scenario the predictor would simulate me in, especially given that the alternative ground-truth is that this is the scenario that they actually went with! I’m going to put it somewhere between “10%” and “100%”. So I think the odds ratio is around the order of magnitude of “one-in-ten” to “one-in-a-trillion-trillion”. And when I multiply them by the expected utility, the ratio is still well in favor of taking the bomb and making it very likely that in reality I will not lose 100 dollars. Sanity check As a sanity-check, I note this implies that if the utilities-times-probabilities are different, I would not mind taking the$100 hit. Let’s see what the math says here, and then check whether my intuitions agree.

Suppose I value my life at $1 million. Then I think that I should become more indifferent here when the probability of a mistaken simulation approaches 1 in 100,000, or where the money on the line is closer to$.

[You can skip this, but here’s me stating the two multiplications I compared:

• World 1: I fake-kill myself to save $, with probability • World 2: I actually kill myself (cost:$1MM), with probability

To find the indifference point I want the two multiplications of utility-to-probability to come out to be equal. If X = $100, then Y equals 100,000. If Y is a trillion trillion (), then X = . (Unless I did the math wrong.)] I think this doesn’t obviously clash with my intuitions, and somewhat matches them. • If the simulator was getting things wrong 1 in 100,000 times, I think I’d be more careful with my life in the “real world case” (insofar as that is a sensible concept). Going further, if you told me they were wrong 1 in 10 times, this would change my action, so there’s got to be a tipping point somewhere, and this seems reasonable for many people (though I actually value my life at more than$1MM).

• And if the money was that tiny ($), I’d be fairly open to “not taking even the one-in-a-trillion-trillion chance”. (Though really my intuition is that I don’t care about money way before$10^-17, and would probably not risk anything serious starting at like 0.1 cents, because that sort of money seems kind of irritating to have to deal with. So my intuition doesn’t match perfectly here. Though I think that if I were expecting to play trillions of such games, then I would start to actively care about such tiny amounts of money.)

• In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement. That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”. Whether the predictor is accurate isn’t specified in the problem statement, and indeed can’t be specified in the problem statement (lest the scenario be incoherent, or posit impossible epistemic states of the agent being tested). What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation (from which you can perhaps infer additional things about the predictor, but that’s up to you). In other words, the scenario is: as per the information you have, so far, the predictor has predicted 1 trillion trillion times, and been wrong once (or, some multiple of those numbers—predicted 2 trillion trillion times and been wrong twice, etc.). You now observe the given situation (note predicting Right, bomb in Left, etc.). What do you do? Now, we might ask: but is the predictor perfect? How perfect is she? Well… you know that she’s erred once in a trillion trillion times so far—ah, no, make that twice in a trillion trillion times, as of this iteration you now find yourself in. That’s the information you have at your disposal. What can you conclude from that? That’s up to you. Likewise, you say: So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a$1T coin (all but 1 in a trillion trillion times).

The problem statement absolutely is complete. It asks what you would/​should do in the given scenario. There is no need to specify what “would” happen in other (counterfactual) scenarios, because you (the agent) do not observe those scenarios. There’s also no question of what would happen if you “always spite the predictor’s prediction”, because there is no “always”; there’s just the given situation, where we know what happens if you choose Left: you burn to death.

You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.

• The problem statement absolutely is complete.

It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.) And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity. What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a$100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem. I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history. Now, we might ask: but is the predictor perfect? How perfect is she? I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis. And in the second case, your problem statement is revealed to be a lie. Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong. You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here. There’s impossibliity here precisely insofar as the predictor is accurate. One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense. (My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/​low/​extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.) Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.) To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty? • It’s not complete enough to determine what I do when I don’t see a bomb. I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario? You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb. … flatly asserting consequences of my actions as if they’re facts … Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?) (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb [Left] if it’s present, but pays the$100 [Right] if it isn’t.)

Well, let’s see. The problem says:

If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.

So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.

I… don’t see what bearing this has on the disagreement, though.

Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the$100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop blacing bombs if-counterfactually I took them, before I believe I’m in this problem.

I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.

What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.

I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.

And in the second case, your problem statement is revealed to be a lie.

I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…

Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?

There’s impossibliity here precisely insofar as the predictor is accurate.

Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)

One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”.

Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.

… you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/​low/​extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.

But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!

… in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life …

… or, you could just… choose Right. That seems to me to be a clear win.

Separately, I note that if you think an agent should behave differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a googleplex, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)

If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/​or one’s beliefs about what scenario one finds oneself in…

To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?

How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)

• The scenario does not appear to be in any way impossible.

The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.

This appears to be paradoxical, but that seems to me to be the predictor’s fault

Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.

In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.

Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.

I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb. You’re welcome to test it empirically (well, maybe after adding at least$1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900). ...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict. (Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.) How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case. Like, presumably when I present you with the high/​low/​extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe? • The scenario says “the predictor is likely to be accurate” Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description. and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now. Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once… I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay$100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.

I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.

...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.

I am not familiar with this, no. If you have explanatory material /​ intuition pumps /​ etc. to illustrate this, I’d certainly appreciate it!

Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur.

I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).

Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.

Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?

The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…

• that seems like an unnecessarily vague characterization of a precise description

I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.

(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)

I am not familiar with this, no. If you have explanatory material /​ intuition pumps /​ etc. to illustrate this, I’d certainly appreciate it!

The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)

I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).

Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with$1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions. Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that. I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right. And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra$1.

Furthermore, I note that in the high/​low/​extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.

And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?

• Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.

Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:

1. The predictor thought I would take the right box, and was correct.

2. The predictor thought I would take the right box, and was incorrect.

Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.

Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can. Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved against a population consisting (almost) entirely of ObeyBots. What would you make of this claim? It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%. This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against. Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”. With this as a prerequisite, we are now equipped to address your next question: What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now. (First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.) Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves? At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”. I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction! Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory? Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries: • The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives. • Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday. • If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about. (The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.) • Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left. The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say. This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right. So the whole scenario is pointless. It doesn’t explore what it was intended to explore. You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to. • There’s also no question of what would happen if you “always spite the predictor’s prediction” There IS a question of what would happen if you “always spite the predictor’s prediction”, since doing so seems to make the 1 in a trillion trillion error rate impossible. • In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/​logical/​functional/​whatever decision theories accept it. To be clear, FDT does not accept causation that happens backwards in time. It’s not claiming that the action of one-boxing itself causes there to be a million dollars in the box. It’s the agent’s algorithm, and, further down the causal diagram, Omega’s simulation of this algorithm that causes the million dollars. The causation happens before the prediction and is nothing special in that sense. • Yes, sure. Indeed we don’t need to accept causation of any kind, in any temporal direction. We can simply observe that one-boxers get a million dollars, and two-boxers do not. (In fact, even if we accept shminux’s model, this changes nothing about what the correct choice is.) • We can simply observe that one-boxers get a million dollars, and two-boxers do not. Eh? This kind of reasoning leads to failing to smoke on Smoking Lesion. • The main point of FDT is that it gives the optimal expected utility on average for agents using it. It does not guarantee optimal expected utility for every instance of an agent using it. Suppose you have a population of two billion agents, each going through this scenario every day. Upon seeing a note predicting right, one billion would pick left and one billion would pick right. We can assume that they all pick left if they see a note predicting left or no note at all. Every year, the Right agents essentially always see a note predicting right, and pay more than$30000 each. The Left agents essentially always see a note predicting left (or no note) and pay $0 each. The average rate of deaths is comparable: one death per few trillion years in each group, which is to say, essentially never. They all know that it could happen, of course. Which group is better off? Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box. • Obviously, the group that’s better off is the third group: the one that picks Left if there’s no bomb in there, Right otherwise. … I mean, seriously, what the heck? The scenario specifies that the boxes are open! You can see what’s in there! How is this even a question? (Bonus question: what will the predictor say about the behavior of this third group? What choice will she predict a member of this group will make?) • [ ] [deleted] • Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box. Two questions, if I may: 1. Why do you read it this way? The problem simply states the failure rate is 1 in a trillion trillion. 2. If we go with your interpretation, why exactly does that change things? It seems to me that the sample size would have to be extemely huge in order to determine a failure rate that low. • It depends upon what the meaning of the word “is” is: 1. The failure rate has been tested over an immense number of prediction, and evaluated as 10^-24 (to one significant figure). That is the currently accepted estimate for the predictor’s error rate for scenarios randomly selected from the sample. 2. The failure rate is theoretically 10^-24, over some assumed distribution of agent types. Your decision model may or may not appear anywhere in this distribution. 3. The failure rate is bounded above by 10^-24 for every possible scenario. A self-harming agent in this scenario cannot be consistently predicted by Predictor at all (success rate 0%), so we know that (3) is definitely false. (1) and (2) aren’t strong enough, because it gives little information about Predictor’s error rate concerning your scenario and your decision model. We have essentially zero information about Predictor’s true error bounds regarding agents that sometimes carry out self-harming actions. In order to recommend taking the left box, an FDT agent is one that sometimes carries out self-harming actions, though this requires that the upper bound on Predictor’s failure of subjunctive dependency is less than the ratio of the utilities of: paying$100, and burning to death all intelligent life in the universe.

We do not have anywhere near enough information to justify that tight a bound. So FDT can’t recommend such an action. Maybe someone else can write a scenario that is in similar spirit, but isn’t so flawed.

• Thanks, I appreciate this. Your answer clarifies a lot, and I will think about it more.

• Another way of phrasing it: you don’t get the $100 marginal payoff if you’re not prepared to knowingly go to your death in the incredibly unlikely event of a particular type of misprediction. That’s the sense in which I meant “you got screwed”. You entered the scenario knowing that it was incredibly unlikely that you would die regardless of what you decide, but were prepared to accept that incredibly microscopic chance of death in exchange for keeping your$100. The odds just went against you.

Edit: If Predictor’s actual bound on error rate was 10^-24, this would be valid. However, Predictor’s bound on error rate cannot be 10^-24 in all scenarios, so this is all irrelevant. What a waste of time.

• These arguments—the Bomb argument and Torture versus Dust Specs—suffer from an ambiguity between telling the reader what to do given their existing UF/​preferences, telling the reader to have a different UF, and saying what an abstract agent , but not the reader, would do.

Suppose the reader has a well defined utility function where death of torture are set to minus infinity. Then the writer can’t persuade them to trade off death or torture against any finite amount of utility. So, in what sense is the reader wrong about their own preferences?

Maybe their preferences don’t have the right mathematical structure to be a utility function in the technical sense. But then why would someone reformat their preferences into an ideal von Neumann form. What’s the advantage? It’s sometimes said that coherent preferences prevent you from being money pumped , or Dutch booked. But that doesn’t sound nearly as bad as being killed or tortured. If a perfectly rational decision theorist would accept being killed or tortured, then I don’t want to be one.

Or maybe these arguments just descibe ideal rationalists, and aren’t intended to persuade the reader at all.

• I think sometimes writers mix up moral theories with decision theories.

Decision theory problems are best expressed using reasonably modest amounts of money, because even if readers don’t themselves have linear utility of money over that range, it’s something that’s easily imagined.

Moral theories are usually best expressed in non-monetary terms, but going straight to torture and murder is pretty lazy in my opinion. Fine, they’re things that most people think are “generally wrong” without being politically hot, but they still seem to bypass rationality, which makes discussion go stupid.

This bomb example did the stupid thing of including torture and death and annihilation of all intelligent life in the universe balanced against money and implausibly small probabilities and a bunch of other crap, and also left such huge holes in the specification that their argument didn’t even work. Pretty much a dumpster fire of what not to do in illustrating some fine points of decision theory.

• I don’t think morality enters into this at all. I don’t see any moral concerns in the described scenario, only prudential ones (i.e., concerns about how best to satisfy one’s own values).

As such, your reply seems to me to be non-responsive to TAG’s comment…

• TAG’s comment was in part about the ambiguity between telling the reader what to do given their existing UF/​preferences, and telling the reader to have a different UF. The former is an outcome of recommending a decision theory, while the latter is the outcome of recommending a moral theory. Hence my comment about how to recognize distinctions between them as a reader, and differences in properties of the scenarios that are relevant as a writer.

I also evaluated this scenario (and implicitly, torture vs dust specks) for how well it illustrates decision theory aspects, and found that it does so poorly in the sense that it includes elements that are more suited to moral theory scenarios. I hoped this would go some way toward explaining why these scenarios might indeed seem ambiguous between telling the reader what to do, and telling the reader what to value.

• TAG’s comment was in part about the ambiguity between telling the reader what to do given their existing UF/​preferences, and telling the reader to have a different UF. The former is an outcome of recommending a decision theory, while the latter is the outcome of recommending a moral theory.

I don’t think this is right. Suppose that you prefer apples to pears, pears to grapes, and grapes to apples. I tell you that this is irrational (because intransitive), and that you should alter your preferences, on pain of Dutch-booking (or some such).

Is that a moral claim? It does not seem to me to be any such thing; and I think that most moral philosophers would agree with me…

• Sure, there are cases that aren’t moral theory discussions in which you might be told to change your values. I didn’t claim that my options were exhaustive, though I did make an implicit claim that those two seemed to cover the vast majority of potential ambiguity in cases like this. I still think that claim has merit.

More explicitly, I think that the common factor here is assuming some utility to the outcomes that has a finite ratio, and arriving at an unpalatable conclusion. Setting aside errors in the presentation of the scenario for now, there are (at least) two ways to view the outcome:

1. FDT says that you should let yourself burn to death in some scenario, because the ratio of disutility of burning to death vs paying $100 is not infinite. This is ridiculous, therefore FDT is wrong. 2. FDT says that you should let yourself burn to death in some scenario, because the ratio of disutility of burning to death vs paying$100 is not infinite. This is ridiculous, therefore the utilities are wrong.

Questions like “is an increased probability (no mater how small) of someone suffering a horrible painful death, no matter how small, always worse than a moderate amount of money” are typical questions of moral theory rather than decision theory.

The ambiguity would go away if the stakes were simply money on both sides.

• Questions like “is an increased probability (no mater how small) of someone suffering a horrible painful death, no matter how small, always worse than a moderate amount of money” are typical questions of moral theory rather than decision theory.

Er, no. I don’t think this is right either. Since “someone” here refers to yourself, the question is: “is an increased probability (no matter how small) of you suffering a horrible painful death, always worse than a moderate amount of money?” This is not a moral question; it’s a question about your own preferences.

(Of course, it’s also not the question we’re being asked to consider in the “Bomb” scenario, because there we’re not faced with a small probability of horrible painful death, or a small increase in the probability of horrible painful death, but rather a certain horrible painful death; and we’re comparing that to the loss of a moderate amount of money. This also is not a moral question, of course.)

The ambiguity would go away if the stakes were simply money on both sides.

Well, first of all, that would make the problem less interesting. And surely we don’t want to say “this decision theory only handles questions of money; as soon as we ask it to evaluate questions of life and death, it stops giving sensible answers”?

Secondly, I don’t think that any problem goes away if there’s just money on both sides. What if it were a billion dollars you’d have to pay to take Left, and a hundred to take Right? Well… in that case, honestly, the scenario would make even less sense than before, because:

1. What if I don’t have a billion dollars? Am I now a billion dollars in debt? To whom? I’m the last person in existence, right?

2. What’s the difference between losing a hundred dollars and losing a billion dollars, if I’m the only human in existence? What am I even using money for? What does it mean to say that I have money?

3. Can I declare myself to be a sovereign state, issue currency (conveniently called the “dollar”), and use it to pay the boxes? Do they have to be American dollars? Can I be the President of America? (Or the King?) Who’s going to dispute my claim?

And so on…

• I’ve posted a similar scenario which is based on purely money here.

I avoid “burning to death” outcomes in my version because some people do appear to endorse theoretically infinite disutilities for such things, even when they don’t live by such. Likewise there are no insanely low probabilities of failure that are mutually contradictory with other properties of the scenario.

It’s just a straightforward scenario in which FDT says you should choose to lose $1000 whenever that option is available, despite always having an available option to lose only$100.

• Well, in that case Omega’s prediction and your decision (one-boxing or two-boxing) aren’t subjunctively dependent on the same function. And this kind of dependence is key in FDT’s decision to one-box! Without it, FDT recommends two-boxing, like CDT.

In Newcombe’s problem, Omega is a perfect predictor, not just a very good one. Subjunctive dependence is necessarily also perfect in that case.

If Omega is imperfect in various ways, their predictions might be partially or not at all subjunctively dependent upon yours and below some point on this scale FDT will start recommending two-boxing, as it should.

Omega can still be a nearly perfect predictor by some measures while still having zero subjunctive dependence. Conversely, even a comparatively poor predictor can still have enough subjunctive dependence that you should one-box (it only takes 0.1%).

In reality you have other actions available than just “one box” or “two box”. You may be able to change things about yourself so that Omega will be more likely to predict you to one-box, which may be worthwhile depending on the cost of whatever actions you take. Increasing your chance of an extra million dollars is probably worth some effort.

While technically within the scope of decision theory, any such actions are likely to be dependent upon fiddly details of Omega’s prediction processes and too annoying to model in toy problems. However, the existence of such actions is relevant to related fields in psychology. A great deal of human labour seems to go into efforts to shift others’ predictions about their behaviour. Some of these actions might actually change their future behaviour in a direction that aligns with the changed predictions to some extent (even if unintended), while others do not.

• At first this bomb scenario looked like an interesting question, but too much over-specification in some respects and vagueness in others means that in this scenario FDT recommends taking the right box, not left as claimed.