Darmani comments on $1000 USD prize—Circular Dependency of Counterfactuals

Darmani 3 Jan 2022 8:56 UTC
1 point
I don’t understand what counterfactuals have to do with Newcomb’s problem. You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes. I don’t see any updating on a factual world to infer outcomes in a counterfactual world.

”Prediction” in this context is a synonym for conditioning. $P (x | y)$ is defined as $\frac{P (x, y)}{P (y)}$ .
If intervention sounds circular...I don’t know what to say other than read Chapter 1 of Pearl ( https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X ).
To give a two-sentence technical explanation:
A structural causal model is a straight-line program with some random inputs. They look like this
u1 = randBool()
rain = u1
sprinkler = !rain
wet_grass = rain || sprinkler
It’s usually written with nodes and graphs, but they are equivalent to straight-line programs, and one can translate easily between these two presentations.
In the basic Pearl setup, an intervention consists of replacing one of the assignments above with an assignment to a constant. Here is an intervention setting the sprinkler off.
u1 = randBool()
rain = u1
sprinkler = false
wet_grass = rain || sprinkler
From this, one can easily compute that $P (w e t_{g} r a s s | d o (s p r i n k l e r = f a l s e)) = \frac{1}{2}$ .
If you want the technical development of counterfactuals that my post is based on, read Pearl Chapter 7, or Google around for the “twin network construction.”
Or I’ll just show you in code below how you compute the counterfactual “I see the sprinkler is on, so, if it hadn’t come on, the grass would not be wet,” which is written $P (w e t_g r a s s | s p r i n k l e r = t r u e, d o (s p r i n k l e r = f a l s e)) = 0$
We construct a new program,
u1 = randBool()
rain = u1
sprinkler_factual = !rain
wet_grass_factual = rain || sprinkler_factual
sprinkler_counterfactual = false
wet_grass_counterfactual = rain || sprinkler_counterfactual
This is now reduced to a pure statistical problem. Run this program a bunch of times, filter down to only the runs where sprinkler_factual is true, and you’ll find that wet_grass_counterfactual is false in all of them.
If you write this program as a dataflow graph, you see everything that happens after the intervention point being duplicated, but the background variables (the rain) are shared between them. This graph is the twin network, and this technique is called the “twin network construction.” It can also be thought of as what the do(y | x → e) operator is doing in our Omega language.
- Chris_Leong 4 Jan 2022 0:22 UTC
  2 points
  Parent
  Everyone agrees what you should do if you can precommit. The question becomes philosophically interesting when an agent faces this problem without having had the opportunity to precommit.
  - Darmani 4 Jan 2022 2:22 UTC
    1 point
    Parent
    Okay, I see how that technique of breaking circularity in the model looks like precommitment.
    I still don’t see what this has to do with counterfactuals though.
    - Chris_Leong 4 Jan 2022 4:18 UTC
      2 points
      Parent
      “You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes.”
      So you wouldn’t class this as precommitment?
      - Darmani 4 Jan 2022 5:12 UTC
        1 point
        Parent
        I realize now that this expressed as a DAG looks identical to precommitment.
        Except, I also think it’s a faithful representation of the typical Newcomb scenario.
        Paradox only arises if you can say “I am a two-boxer” (by picking up two boxes) while you were predicted to be a one-boxer. This can only happen if there are multiple nodes for two-boxing set to different values.
        But really, this is a problem of the kind solved by superspecs in my Onward! paper. There is a constraint that the prediction of two-boxing must be the same as the actual two-boxing. Traditional causal DAGs can only express this by making them literally the same node; super-specs allow more flexibility. I am unclear how exactly it’s handled in FDT, but it has a similar analysis of the problem (“CDT breaks correlations”).