If you look back at my above posts, I deduce that in Pearl’s calculus we will get P[X = 0 | do (Z = 1)] = P[X = 0 | do(Yi = 1 for all i)] = 1. We agree here with what Pearl’s calculus says.
No, we disagree. My calculations suggest that P[X = 0 | do(Yi = 1 for all i)] = P[X = 1 | do(Yi = 1 for all i)] = 0. The intervention falls outside the region where the original joint pdf has positive mass. The intervention do(X = 1) also annihilates the original joint pdf, because there is no region of positive mass in which X = 1.
I still don’t understand why you don’t think the problem is a language problem. Pearl’s counterfactuals have a specific meaning, so of course they don’t mean something else from what they mean, even if the other meaning is a more plausible interpretation of the counterfactual (again, whatever that means—I’m still not sure what “more plausible” is supposed to mean theoretically).
The problem is that the counterfactual interpretation of this is “If the average value of the Yi were 1, then X would have been 0”. And that seems plain implausible as a counterfactual. The much more plausible counterfactual backtracks to change X, allowing all the Yi to change to 1 through a single change in the causal graph, namely “If the average value of the Yi were 1, then X would have been 1″.
I think the problem is that when you intervene to make something impossible happen, the resulting system no longer makes sense.
I believe you agreed that the Gore counterfactual needs to backtrack to make sense, so you agree with backtracking in principle?
Yes. (I assume you mean “If Gore was president during 9/11, he wouldn’t have invaded Iraq.”)
In that case, you should disagree with the Pearl treatment of counterfactuals, since they never backtrack (they can’t).
Why should I disagree with Pearl’s treatment of counterfactuals that don’t backtrack?
Isn’t the decision of whether or not a given counterfactual backtracks in its most “natural” interpretation largely a linguistic problem?
No, we disagree. My calculations suggest that P[X = 0 | do(Yi = 1 for all i)] = P[X = 1 | do(Yi = 1 for all i)] = 0. The >intervention falls outside the region where the original joint pdf has positive mass. The intervention do(X = 1) also >annihilates the original joint pdf, because there is no region of positive mass in which X = 1.
I don’t think that’s correct. My understanding of the intervention do(Yi = 1 for all i)] is that it creates a disconnected graph, in which the Yi all have the value 1 (as stipulated by the intervention) but X retains its original mass function P[X = 0] = 1. The causal links from X to the Yi are severed by the intervention, so it doesn’t matter that the intervention has zero probability in the original graph, since the intervention creates a new graph. (Interventions into deterministic systems often will have zero probability in the original system, though not in the intervened one.) On the other hand, you claim to be following Pearl_2012 whereas I’ve been reading Pearl_2001 and there might have been some differences in his treatment of impossible interventions… I’ll check this out.
For now, just suppose the original distribution over X was P[X = 0] = 1 - epsilon and P[X = 1] = epsilon for a very small epsilon. Would you agree that the intervention do(Yi = 1 for all i) now is in the area of positive mass function, but still doesn’t change the distribution over X so we still have P[X = 0 | do(Yi = 1 for all i)] = 1 - epsilon and P[X = 1 | do(Yi = 1 for all i)] = epsilon?
Isn’t the decision of whether or not a given counterfactual backtracks in its most “natural” interpretation largely a >linguistic problem?
I still think it’s a conceptual analysis problem rather than a linguistic problem. However perhaps we should play the taboo game on “linguistic” and “conceptual” since it seems we mean different things by them (and possibly what you mean by “linguistic” is close to what I mean by “conceptual” at least where we are talking about concepts expressed in English).
You seem to be done, so I won’t belabor things further; I just want to point out that I didn’t claim to have a more updated copy of Pearl (in fact, I said the opposite). I doubt there’s been any change to his algorithm.
All this ASCII math is confusing the heck out of me, anyway.
EDIT: Oh, dear. I see how horribly wrong I was now. The version of the formula I was looking at said “(formula) for (un-intervened variables) consistent with (intervention), and zero otherwise” and because it was a deterministic system my mind conflated the two kinds of consistency. I’m really sorry to have blown a lot of your free time on my own incompetence.
Thanks for that.… You just saved me a few hours additional research on Pearl to find out whether I’d got it wrong (and misapplied the calculus for interventions that are impossible in the original system)!
Incidentally, I’m quite a fan of Pearl’s work, and think there should be ways to adjust the calculus to allow reasonable backtracking counterfactuals as well as forward-tracking ones (i.e. ways to find a minimal intervention further back in the graph, one which then makes the antecedent come out true..) But that’s probably worth a separate post, and I’m not ready for it yet.
Thanks for that.… You just saved me a few hours additional research on Pearl to find out whether I’d got it wrong (and misapplied the calculus for interventions that are impossible in the original system)!
Incidentally, I’m quite a fan of Pearl’s work, and think there should be ways to adjust the calculus to allow reasonable backtracking counterfactuals as well as forward-tracking ones (i.e. ways to find a minimal intervention further back in the graph, one which then makes the antecedent come out true..) But that’s probably worth a separate post, and I’m not ready for it yet.
No, we disagree. My calculations suggest that P[X = 0 | do(Yi = 1 for all i)] = P[X = 1 | do(Yi = 1 for all i)] = 0. The intervention falls outside the region where the original joint pdf has positive mass. The intervention do(X = 1) also annihilates the original joint pdf, because there is no region of positive mass in which X = 1.
I still don’t understand why you don’t think the problem is a language problem. Pearl’s counterfactuals have a specific meaning, so of course they don’t mean something else from what they mean, even if the other meaning is a more plausible interpretation of the counterfactual (again, whatever that means—I’m still not sure what “more plausible” is supposed to mean theoretically).
I think the problem is that when you intervene to make something impossible happen, the resulting system no longer makes sense.
Yes. (I assume you mean “If Gore was president during 9/11, he wouldn’t have invaded Iraq.”)
Why should I disagree with Pearl’s treatment of counterfactuals that don’t backtrack?
Isn’t the decision of whether or not a given counterfactual backtracks in its most “natural” interpretation largely a linguistic problem?
I don’t think that’s correct. My understanding of the intervention do(Yi = 1 for all i)] is that it creates a disconnected graph, in which the Yi all have the value 1 (as stipulated by the intervention) but X retains its original mass function P[X = 0] = 1. The causal links from X to the Yi are severed by the intervention, so it doesn’t matter that the intervention has zero probability in the original graph, since the intervention creates a new graph. (Interventions into deterministic systems often will have zero probability in the original system, though not in the intervened one.) On the other hand, you claim to be following Pearl_2012 whereas I’ve been reading Pearl_2001 and there might have been some differences in his treatment of impossible interventions… I’ll check this out.
For now, just suppose the original distribution over X was P[X = 0] = 1 - epsilon and P[X = 1] = epsilon for a very small epsilon. Would you agree that the intervention do(Yi = 1 for all i) now is in the area of positive mass function, but still doesn’t change the distribution over X so we still have P[X = 0 | do(Yi = 1 for all i)] = 1 - epsilon and P[X = 1 | do(Yi = 1 for all i)] = epsilon?
I still think it’s a conceptual analysis problem rather than a linguistic problem. However perhaps we should play the taboo game on “linguistic” and “conceptual” since it seems we mean different things by them (and possibly what you mean by “linguistic” is close to what I mean by “conceptual” at least where we are talking about concepts expressed in English).
Thanks anyway.
You seem to be done, so I won’t belabor things further; I just want to point out that I didn’t claim to have a more updated copy of Pearl (in fact, I said the opposite). I doubt there’s been any change to his algorithm.
All this ASCII math is confusing the heck out of me, anyway.
EDIT: Oh, dear. I see how horribly wrong I was now. The version of the formula I was looking at said “(formula) for (un-intervened variables) consistent with (intervention), and zero otherwise” and because it was a deterministic system my mind conflated the two kinds of consistency. I’m really sorry to have blown a lot of your free time on my own incompetence.
Thanks for that.… You just saved me a few hours additional research on Pearl to find out whether I’d got it wrong (and misapplied the calculus for interventions that are impossible in the original system)!
Incidentally, I’m quite a fan of Pearl’s work, and think there should be ways to adjust the calculus to allow reasonable backtracking counterfactuals as well as forward-tracking ones (i.e. ways to find a minimal intervention further back in the graph, one which then makes the antecedent come out true..) But that’s probably worth a separate post, and I’m not ready for it yet.
Thanks for that.… You just saved me a few hours additional research on Pearl to find out whether I’d got it wrong (and misapplied the calculus for interventions that are impossible in the original system)!
Incidentally, I’m quite a fan of Pearl’s work, and think there should be ways to adjust the calculus to allow reasonable backtracking counterfactuals as well as forward-tracking ones (i.e. ways to find a minimal intervention further back in the graph, one which then makes the antecedent come out true..) But that’s probably worth a separate post, and I’m not ready for it yet.