mesaoptimizer answers What are the two contradictory theories of how to evaluate counterfactuals?

mesaoptimizer 25 Jul 2025 19:25 UTC
13 points
0
1. The first intuition is that the counterfactual involves changes the physical result of your decision making, not the process of your decision making itself. The second intuition is that the counterfactual involves a replacement of the process of your decision making such that you’d take another action than you would normally do.
2. I imagine it as the following:
  - Physical intervention: I imagine that I’m possessed by a demon that leads me to take the physical actions to choose another option than I would have voluntarily.
  - Logical intervention: I imagine that I was a different person with a different life history, that would have led me to choose a different path than the me in physical reality would choose. This doesn’t quite communicate how loopy logical intervention can feel, however: I usually imagine logical alternative futures as ones where you effectively have 2+2=3 or something equally clearly illogical as a part of the bedrock of the universe.
3. I don’t think that different problems lead one to develop different intuitions. I think that physical intervention is the more intuitive way people relate to counterfactuals, including for mundane decision theory problems like Newcomb’s problem, and that logical intervention is something people need clarifying thought experiments to get used to. I found Counterlogical Mugging (which is counterfactual mugging but involves a statement you have logical uncertainty over) as a very useful intuition pump to start thinking in terms of logical intervention as a counterfactual.
For a more rigorous explanation, here’s the relevant section from MacDermott et al., “Characterising Decision Theories with Mechanised Causal Graphs”:

But in the Twin Prisoner’s Dilemma, one might interpret the policy node in two diﬀerent ways, and the interpretation will aﬀect the causal structure. We could interpret intervening on your policy ˜D as changing the physical result of the compilation of your source code, such that an intervention will only aﬀect your decision D, and not that of your twin T . Under this physical notion of causality, we get ﬁg. 3a, where there is a common cause S explaining the correlation between the agent’s policy and its twin’s.

But on the other hand, if we think of intervening on your policy as changing the way your source code compiles in all cases, then intervening on it will aﬀect your opponent’s policy, which is compiled from the same code. In this case, we get the structure shown in ﬁg. 3b, where an intervention on my policy would aﬀect my twin’s policy. We can view this as an intervention on an abstract “logical” variable rather than an ordinary physical variable. We therefore call the resulting model a logical-causal model.

Pearl’s notion of causality is the physical one, but Pearl-style graphs have also been used in the decision theory literature to represent logical causality. One purpose of this paper is to show that mechanism variables are a useful addition to any graphical model being used in decision theory.
What links here?
- What are the two contradictory theories of how to evaluate counterfactuals? by Said Achmiz (25 Jul 2025 18:43 UTC; 29 points)
- philh 26 Jul 2025 9:57 UTC
  4 points
  0
  Parent
  The first intuition is that the counterfactual involves changes the physical result of your decision making, not the process of your decision making itself. The second intuition is that the counterfactual involves a replacement of the process of your decision making such that you’d take another action than you would normally do.
  Hm, this makes me realize I’m not fully sure what’s meant by “counterfactual” here.
  I normally thinking of it as, like. I’m looking at a world history, e.g. with variables $A$ and $B$ and times $t = 0, 1, 2$ and some relationships between them. And I say “at $t = 1$ , $A$ took value $a$ . What if at $t = 1$ , I had taken value $a^{'}$ instead? What would that change at $t = 2$ ?” It’s clear how to fit decisions I’ve made in the past into that framework.
  Or I can run it forwards, looking from $t = 0$ to $t = 1, 2$ , imagining what’s going to happen by default, and imagining what happens if I make a change at some point. It’s less clear how to fit my own decisions into this framework, because what does “by default” mean then? But I can just pick some decision to plug in at every point where I get to make one, and say that all of these picks give me a counterfactual. (And perhaps by extension, if there are no decision points, I should also consider the imagined “what’s going to happen by default” world to be a counterfactual.)
  But if the discussion of counterfactuals starts by talking about decisions I’ve made, or are going to make, then it’s not clear to me whether it can be extended to talk about general interventions on world histories.
  I think that the first intuition corresponds to “interventions on causal models using the do operator”. That’s something I don’t think I understand deeply, but I do think I get the basics of, like, “what is this field of study trying to do, what questions is it asking, what sorts of objects does it work with and how do we manipulate them”. (E.g. if this is what we’re doing, then we say “we’re allowed to just set $A = a^{'}$ at $t = 1$ , we don’t need to go back to $t = 0$ and figure out how that state of affairs could have come about”.)
  Does the second intuition correspond to something that we can talk about without talking about my decisions? (And if so, is it a different thing than the first intuition? Or is it, like, they both naturally extend to a world with no decision points for me, but the way they extend to that is the same in those worlds, and so they only differ in worlds that do have decision points for me?)
- Said Achmiz 25 Jul 2025 19:33 UTC
  4 points
  0
  Parent
  Thank you! That’s definitely more clear than anything I’ve read about this on LW to date!
  
  Follow-up question that immediately occurs to me:
  
  Why are these two ways of evaluating counterfactuals and not, like… “answers to two different questions”? What I mean is: if we want to know what would happen in a “counterfactual” case, it seems like the first thing to do is to say “now, by that do you mean to ask what would happen under physical intervention, or what would happen under logical intervention?” Right? Those would (could?) have different answers, and really do seem like different questions, so after realizing that they’re different questions, have we thereby resolved all confusions about “counterfactuals”? Or do some puzzles remain?
  - mesaoptimizer 25 Jul 2025 20:12 UTC
    4 points
    0
    Parent
    
    What I mean is: if we want to know what would happen in a “counterfactual” case, it seems like the first thing to do is to say “now, by that do you mean to ask what would happen under physical intervention, or what would happen under logical intervention?” Right?
    
    Yes.
    
    Those would (could?) have different answers, and really do seem like different questions, so after realizing that they’re different questions, have we thereby resolved all confusions about “counterfactuals”?
    
    I think that intervening on causality and logic are the only two ways one could intervene to create an outcome different from the one that actually occurs.
    
    Or do some puzzles remain?
    
    I don’t work in the decision theory field, so I want someone else to answer this question.