[Question] What are the two contradictory theories of how to evaluate counterfactuals?

Said Achmiz25 Jul 2025 18:43 UTC

29 points

Counterfactuals Rationality World Modeling

In this comment thread on the 2021 post “A Defense of Functional Decision Theory”, @So8res wrote:

...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.

Following up on my reply that I didn’t know what he was talking about, he recommended some reading:

The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)

I followed both of those links, and was not enlightened. The linked FDT paper had this bit:

In short, CDT and FDT both construct counterfactuals by performing a surgery on their world-model that breaks some correlations and preserves others, but where CDT agents preserve only causal structure in their hypotheticals, FDT agents preserve all decision-relevant subjunctive dependencies in theirs.

(This follows a rather technical section, which I have little confidence in having understood correctly.)

Has the dichotomy that @So8res refers to been clearly explained anywhere? If not—can anyone explain it now? The relevant questions are:

What are the “two different internally-consistent but contradictory theories of how to evaluate counterfactuals”?
What are the intuitions for each (which humans come equipped with)?
What problems pump those intuitions?

Said Achmiz25 Jul 2025 18:43 UTC

29 points

16 comments1 min readLW link

Counterfactuals Rationality World Modeling

mesaoptimizer 25 Jul 2025 19:25 UTC
13 points
0
1. The first intuition is that the counterfactual involves changes the physical result of your decision making, not the process of your decision making itself. The second intuition is that the counterfactual involves a replacement of the process of your decision making such that you’d take another action than you would normally do.
2. I imagine it as the following:
  - Physical intervention: I imagine that I’m possessed by a demon that leads me to take the physical actions to choose another option than I would have voluntarily.
  - Logical intervention: I imagine that I was a different person with a different life history, that would have led me to choose a different path than the me in physical reality would choose. This doesn’t quite communicate how loopy logical intervention can feel, however: I usually imagine logical alternative futures as ones where you effectively have 2+2=3 or something equally clearly illogical as a part of the bedrock of the universe.
3. I don’t think that different problems lead one to develop different intuitions. I think that physical intervention is the more intuitive way people relate to counterfactuals, including for mundane decision theory problems like Newcomb’s problem, and that logical intervention is something people need clarifying thought experiments to get used to. I found Counterlogical Mugging (which is counterfactual mugging but involves a statement you have logical uncertainty over) as a very useful intuition pump to start thinking in terms of logical intervention as a counterfactual.
For a more rigorous explanation, here’s the relevant section from MacDermott et al., “Characterising Decision Theories with Mechanised Causal Graphs”:

But in the Twin Prisoner’s Dilemma, one might interpret the policy node in two diﬀerent ways, and the interpretation will aﬀect the causal structure. We could interpret intervening on your policy ˜D as changing the physical result of the compilation of your source code, such that an intervention will only aﬀect your decision D, and not that of your twin T . Under this physical notion of causality, we get ﬁg. 3a, where there is a common cause S explaining the correlation between the agent’s policy and its twin’s.

But on the other hand, if we think of intervening on your policy as changing the way your source code compiles in all cases, then intervening on it will aﬀect your opponent’s policy, which is compiled from the same code. In this case, we get the structure shown in ﬁg. 3b, where an intervention on my policy would aﬀect my twin’s policy. We can view this as an intervention on an abstract “logical” variable rather than an ordinary physical variable. We therefore call the resulting model a logical-causal model.

Pearl’s notion of causality is the physical one, but Pearl-style graphs have also been used in the decision theory literature to represent logical causality. One purpose of this paper is to show that mechanism variables are a useful addition to any graphical model being used in decision theory.
What links here?
- What are the two contradictory theories of how to evaluate counterfactuals? by Said Achmiz (25 Jul 2025 18:43 UTC; 29 points)
- philh 26 Jul 2025 9:57 UTC
  4 points
  0
  Parent
  The first intuition is that the counterfactual involves changes the physical result of your decision making, not the process of your decision making itself. The second intuition is that the counterfactual involves a replacement of the process of your decision making such that you’d take another action than you would normally do.
  Hm, this makes me realize I’m not fully sure what’s meant by “counterfactual” here.
  I normally thinking of it as, like. I’m looking at a world history, e.g. with variables $A$ and $B$ and times $t = 0, 1, 2$ and some relationships between them. And I say “at $t = 1$ , $A$ took value $a$ . What if at $t = 1$ , I had taken value $a^{'}$ instead? What would that change at $t = 2$ ?” It’s clear how to fit decisions I’ve made in the past into that framework.
  Or I can run it forwards, looking from $t = 0$ to $t = 1, 2$ , imagining what’s going to happen by default, and imagining what happens if I make a change at some point. It’s less clear how to fit my own decisions into this framework, because what does “by default” mean then? But I can just pick some decision to plug in at every point where I get to make one, and say that all of these picks give me a counterfactual. (And perhaps by extension, if there are no decision points, I should also consider the imagined “what’s going to happen by default” world to be a counterfactual.)
  But if the discussion of counterfactuals starts by talking about decisions I’ve made, or are going to make, then it’s not clear to me whether it can be extended to talk about general interventions on world histories.
  I think that the first intuition corresponds to “interventions on causal models using the do operator”. That’s something I don’t think I understand deeply, but I do think I get the basics of, like, “what is this field of study trying to do, what questions is it asking, what sorts of objects does it work with and how do we manipulate them”. (E.g. if this is what we’re doing, then we say “we’re allowed to just set $A = a^{'}$ at $t = 1$ , we don’t need to go back to $t = 0$ and figure out how that state of affairs could have come about”.)
  Does the second intuition correspond to something that we can talk about without talking about my decisions? (And if so, is it a different thing than the first intuition? Or is it, like, they both naturally extend to a world with no decision points for me, but the way they extend to that is the same in those worlds, and so they only differ in worlds that do have decision points for me?)
- Said Achmiz 25 Jul 2025 19:33 UTC
  4 points
  0
  Parent
  Thank you! That’s definitely more clear than anything I’ve read about this on LW to date!
  
  Follow-up question that immediately occurs to me:
  
  Why are these two ways of evaluating counterfactuals and not, like… “answers to two different questions”? What I mean is: if we want to know what would happen in a “counterfactual” case, it seems like the first thing to do is to say “now, by that do you mean to ask what would happen under physical intervention, or what would happen under logical intervention?” Right? Those would (could?) have different answers, and really do seem like different questions, so after realizing that they’re different questions, have we thereby resolved all confusions about “counterfactuals”? Or do some puzzles remain?
  - mesaoptimizer 25 Jul 2025 20:12 UTC
    4 points
    0
    Parent
    
    What I mean is: if we want to know what would happen in a “counterfactual” case, it seems like the first thing to do is to say “now, by that do you mean to ask what would happen under physical intervention, or what would happen under logical intervention?” Right?
    
    Yes.
    
    Those would (could?) have different answers, and really do seem like different questions, so after realizing that they’re different questions, have we thereby resolved all confusions about “counterfactuals”?
    
    I think that intervening on causality and logic are the only two ways one could intervene to create an outcome different from the one that actually occurs.
    
    Or do some puzzles remain?
    
    I don’t work in the decision theory field, so I want someone else to answer this question.
Richard_Kennaway 26 Jul 2025 10:34 UTC
5 points
1
There’s a range of interpretations for any counterfactual. One must open up the “suppose” and ask, “What am I actually being asked to suppose? How might the counterfactual circumstance have come to be?” We can accordingly do surgery on the causal graph in different places, depending on how far back from the event of interest we intervene.

To make X counterfactually have some value x, we might, in terms of causal graph surgery, consider do(X=x). Or we might intervene on some predecessors of X, and use do(Y=y) and do(Z=z), choosing values which cause X to take on the value x, but which may have additional effects. Or we could intervene further back than that, and create even more side-effects. We might discover that we are considering a counterfactual that makes no sense — for example, phosphorus matches that do not burn, yet human life continues.

In Newcomb’s Problem, the two-boxing argument intervenes on both the decision of the person faced with the problem, and Omega’s decision to fill the other box or not, as if there were hidden mechanisms that were to pre-empt both decisions in each of the four possible ways they might be made. (This obviously contradicts one of the hypotheses of the problem, which is that Omega is always right.) The one-boxing argument intervenes on the choice of policy that produces the subject’s decision, and does not intervene on Omega.

I could call these CDT and FDT respectively, except for the tendency of people to modify their preferred decision theory xDT in response to problems that it gets wrong, and claim to be still using xDT, “properly understood”. I just described the one-boxer’s argument in causal terms. That does not mean that CDT, “properly understood”, is FDT.

ETA: While googling something about counterfactuals, I came across Molinism, according to which God knows all counterfactuals, and in particular knows what the creatures that he created would do of their own free will in any hypothetical situation. Omega is probably an angel sent by God to test people’s rationality. (Epistemic status: jeu d’esprit.)
Heighn 21 Aug 2025 14:58 UTC
3 points
0
Regarding (2), I interpret Soares’ point as follows: there’s a “CDT intuition” and an “FDT intuition” of how to evaluate counterfactuals. Let’s just take the Bomb problem as an example.
CDT intuition
The CDT intuition deals with which action has the best causal consequences at one given point in time. In Bomb, you can either Left-box or Right-box. There’s a bomb in Left, so Left-boxing causes you to die painfully. Right-boxing costs only $100, so right-boxing wins.
FDT intuition
The FDT intuition deals with which decision is the best outcome of your decision procedure. Your decision procedure is a function, and could be implemented more than once. In Bomb, it’s implemented both in your head and in the predictor’s head (she executed it to predict what you would do). Your decision to Left-box or to Right-box therefore happens twice—and, since your decision procedure is a function, it’s necessarily the same on both events—and you have to look at the causal consequences of both events. Left-boxing causes the predictor to not put a bomb in Left and you to not lose $100, while Right-boxing causes the predictor to put a bomb in Left and you to lose $100. Left-boxing wins.
P.S. A natural thing to respond here is something like: “But you already see a Bomb in Left, so the FDT intuition makes no sense!” But note that, since the predictor simulates you in order to make her prediction, you don’t actually know whether you are the “real you” or the “simulated you”, since the simulated you observes the exact same (relevant) things as the real you. (If not, then the simulation would not be accurate and there would be no subjunctive dependence.) So in this intuition, observing the bomb does not actually mean there is a bomb, since you could be in a simulation. In fact, you are, at different points in time, both in the simulation and in the “real” situation, and you have to make a decision that happens in and makes the best of both these situations.
- Said Achmiz 21 Aug 2025 19:04 UTC
  3 points
  1
  Parent
  Thanks! This seems to match up to what @mesaoptimizer wrote in his comment, I think?
  
  One question I do have is: does anyone actually have the “FDT intuition”…? That is, is it really an intuition, or is it a perspective that people need to be reasoned into taking?
  
  (I also have some serious problems with the FDT view, but this is not the place to discuss them, of course.)
  - sunwillrise 21 Aug 2025 19:10 UTC
    11 points
    0
    Parent
    That is, is it really an intuition, or is it a perspective that people need to be reasoned into taking?
    There’s no natural separation between the two. Reasoning and training chisels and changes intuition (S1) just as much as it chisels and changes deliberate thinking (S2).
    Take the example of chess. A grandmaster would destroy me, 10 games out of 10, when playing a classical game. But he would also destroy me 10 games out of 10 when we play a hyperbullet (i.e., 30+0 seconds) game, where the time control is so fast that you simply don’t have time to deliberately analyze variations at all and must instead play almost solely on intuition.^[1] That’s because the grandmaster’s intuition is far far better than mine.
    But the grandmaster was not born with any chess intuition. He was born not knowing anything about the existence of chess, actually. He had to be trained, and to train himself, into it. And through the process of studying chess (classical chess, where you have hours to think about the game and increment to give you extra time for every move), he improved and changed his intuitive, snap, aesthetic judgement too.
    ^
    And that’s the case even if the grandmaster very rarely plays hyperbullet and instead focuses almost solely on classical chess
    - Said Achmiz 21 Aug 2025 19:22 UTC
      4 points
      0
      Parent
      This is true to some extent, but I don’t think it’s relevantly true in the given context. Recall the claim/argument which prompted (the discussion that led to) this post:
      
      ...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
      
      I understood Nate to be saying something other than merely “it is possible for a human to become convinced that FDT is correct, whereupon they will find it intuitive”.
      - sunwillrise 21 Aug 2025 19:31 UTC
        2 points
        0
        Parent
        Hmm. Yeah, I think you’re right. But I suppose I’m a poor advocate for the opposite perspective, since a statement like “Humans come equipped with both intuitions,” in this precise context, yields a category error in my ontology as opposed to being a meaningful statement capable of being true or false.
  - Heighn 22 Aug 2025 10:30 UTC
    3 points
    0
    Parent
    Yeah, it matches wat @mesaoptimizer said, I believe. I was reluctant to post my view, but thought it could be helpful anyway :)
    
    Great question! I’m wondering the same thing now. I, for one, had to be reasoned into it. It does feel like it “clicked”, so to speak, but I doubt whether anyone has this intuition naturally.
    I would be willing to discuss FDT more, if you’d like (in a separate post, of course).
Zack_M_Davis 25 Jul 2025 19:32 UTC
1 point
−16
My understanding is that the two contradictory theories are causal decision theory (CDT), which says to choose the action that will cause the best consequences, and evidential decision theory (EDT), which says to choose the action such that the consequences will be best conditional on the action you chose. Newcomb’s problem makes causal decision theory look bad but evidential decision theory look good. (CDT two-boxes because your choice seemingly can’t cause the prediction, but EDT one-boxes you have more money conditional on one-boxing.) But the smoking lesion problem makes evidential decision theory look bad and causal decision theory look good. (CDT gets to enjoy smoking because it doesn’t cause health problems according to the thought-experiment setup, but EDT doesn’t because according to the setup, you have health problems conditional on enjoying smoking.)
- jessicata 25 Jul 2025 20:40 UTC
  10 points
  5
  Parent
  EDT doesn’t have counterfactuals at all, IMO. It has Bayesian conditionals.
- Said Achmiz 25 Jul 2025 19:35 UTC
  7 points
  5
  Parent
  I am fairly sure that this is not the distinction being made. I think this because the FDT paper first contrasts EDT on the one hand with CDT and FDT on the other hand (saying that CDT and FDT both differ from EDT in the same way), and then goes on to say that CDT and FDT differ in some other way. And the quotes I gave in the OP were also about CDT vs. FDT, with no EDT involved.
- Warty 25 Jul 2025 19:45 UTC
  −2 points
  −3
  Parent
  I never got that cause is deciding to smoke much of an update after you already detected an urge to smoke? edt looks simpler so it should be correct

Vladimir_Nesov 25 Jul 2025 21:34 UTC
4 points
0
A natural way of formulating decision making is to ask how outcomes depend on agent’s behavior. If we try to look at such a dependence pointwise, we might end up with a story for how a particular possible behavior leads to a particular possible world, and in that possible world to a particular outcome (for sufficiently general notions of “possible world” and “outcome”).

This framing is soon in trouble when we consider a fully specified deterministic agent of any kind (either physical or algorithmic), because most possible behaviors are not the actual behavior, and in that sense all possible behaviors except the actual behavior are counterfactual. (This gives a widespread misnomer where “counterfactual” starts referring to all possible behaviors or all possible worlds, including the actual one, even though it’s not literally counterfactual.) It’s not clear how to think about counterfactual things.

Simply asking for the intended meaning (intended construction of counterfactuals, or more generally the intended dependence of everything on agent’s behavior) doesn’t help with the real problem of how to actually make decisions, how the dependence should be constructed. FDT mostly assumes away this issue by giving a way of thinking about the dependence and of formulating it (which in particular translates to its notion of counterfactuals), and asking it to be formulated in this way in the problem statement for decision making.

A possible clue about the issue with counterfactuals is that it’s not known from the outset which behaviors/worlds are counterfactual, and there are no relevant conceptual issues with thinking about the behaviors/worlds that are actual. So a priori (before we know which things are actual) any methods applicable to thinking about the actual behaviors/worlds should also be applicable to the counterfactual ones, and the central puzzle is to figure out how to use this to formulate the dependence of various mostly-counterfactual possibilities of interest on the various mostly-counterfactual possible behaviors of the agent.

[Question] What are the two contradictory theories of how to evaluate counterfactuals?

CDT intuition

FDT intuition