I don’t understand what counterfactuals have to do with Newcomb’s problem. You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes. I don’t see any updating on a factual world to infer outcomes in a counterfactual world.
”Prediction” in this context is a synonym for conditioning.P(x|y) is defined as P(x,y)P(y).
A structural causal model is a straight-line program with some random inputs. They look like this
u1 = randBool()
rain = u1
sprinkler = !rain
wet_grass = rain || sprinkler
It’s usually written with nodes and graphs, but they are equivalent to straight-line programs, and one can translate easily between these two presentations.
In the basic Pearl setup, an intervention consists of replacing one of the assignments above with an assignment to a constant. Here is an intervention setting the sprinkler off.
u1 = randBool()
rain = u1
sprinkler = false
wet_grass = rain || sprinkler
From this, one can easily compute thatP(wetgrass|do(sprinkler=false))=12.
If you want the technical development of counterfactuals that my post is based on, read Pearl Chapter 7, or Google around for the “twin network construction.”
Or I’ll just show you in code below how you compute the counterfactual “I see the sprinkler is on, so, if it hadn’t come on, the grass would not be wet,” which is written P(wet_grass|sprinkler=true,do(sprinkler=false))=0
This is now reduced to a pure statistical problem. Run this program a bunch of times, filter down to only the runs where sprinkler_factual is true, and you’ll find that wet_grass_counterfactual is false in all of them.
If you write this program as a dataflow graph, you see everything that happens after the intervention point being duplicated, but the background variables (the rain) are shared between them. This graph is the twin network, and this technique is called the “twin network construction.” It can also be thought of as what the do(y | x → e) operator is doing in our Omega language.
Everyone agrees what you should do if you can precommit. The question becomes philosophically interesting when an agent faces this problem without having had the opportunity to precommit.
“You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes.”
I realize now that this expressed as a DAG looks identical to precommitment.
Except, I also think it’s a faithful representation of the typical Newcomb scenario.
Paradox only arises if you can say “I am a two-boxer” (by picking up two boxes) while you were predicted to be a one-boxer. This can only happen if there are multiple nodes for two-boxing set to different values.
But really, this is a problem of the kind solved by superspecs in my Onward! paper. There is a constraint that the prediction of two-boxing must be the same as the actual two-boxing. Traditional causal DAGs can only express this by making them literally the same node; super-specs allow more flexibility. I am unclear how exactly it’s handled in FDT, but it has a similar analysis of the problem (“CDT breaks correlations”).
I don’t understand what counterfactuals have to do with Newcomb’s problem. You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes. I don’t see any updating on a factual world to infer outcomes in a counterfactual world.
”Prediction” in this context is a synonym for conditioning.P(x|y) is defined as P(x,y)P(y).
If intervention sounds circular...I don’t know what to say other than read Chapter 1 of Pearl ( https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X ).
To give a two-sentence technical explanation:
A structural causal model is a straight-line program with some random inputs. They look like this
It’s usually written with nodes and graphs, but they are equivalent to straight-line programs, and one can translate easily between these two presentations.
In the basic Pearl setup, an intervention consists of replacing one of the assignments above with an assignment to a constant. Here is an intervention setting the sprinkler off.
From this, one can easily compute thatP(wetgrass|do(sprinkler=false))=12.
If you want the technical development of counterfactuals that my post is based on, read Pearl Chapter 7, or Google around for the “twin network construction.”
Or I’ll just show you in code below how you compute the counterfactual “I see the sprinkler is on, so, if it hadn’t come on, the grass would not be wet,” which is written P(wet_grass|sprinkler=true,do(sprinkler=false))=0
We construct a new program,
This is now reduced to a pure statistical problem. Run this program a bunch of times, filter down to only the runs where sprinkler_factual is true, and you’ll find that wet_grass_counterfactual is false in all of them.
If you write this program as a dataflow graph, you see everything that happens after the intervention point being duplicated, but the background variables (the rain) are shared between them. This graph is the twin network, and this technique is called the “twin network construction.” It can also be thought of as what the do(y | x → e) operator is doing in our Omega language.
Everyone agrees what you should do if you can precommit. The question becomes philosophically interesting when an agent faces this problem without having had the opportunity to precommit.
Okay, I see how that technique of breaking circularity in the model looks like precommitment.
I still don’t see what this has to do with counterfactuals though.
“You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes.”
So you wouldn’t class this as precommitment?
I realize now that this expressed as a DAG looks identical to precommitment.
Except, I also think it’s a faithful representation of the typical Newcomb scenario.
Paradox only arises if you can say “I am a two-boxer” (by picking up two boxes) while you were predicted to be a one-boxer. This can only happen if there are multiple nodes for two-boxing set to different values.
But really, this is a problem of the kind solved by superspecs in my Onward! paper. There is a constraint that the prediction of two-boxing must be the same as the actual two-boxing. Traditional causal DAGs can only express this by making them literally the same node; super-specs allow more flexibility. I am unclear how exactly it’s handled in FDT, but it has a similar analysis of the problem (“CDT breaks correlations”).