“Bayesianism and Causality, Or, Why I am only a Half-Bayesian”.
As a (mostly irrelevant) side note, this is Pearl_2001, who is a very different person from Pearl_2012.
Also, there is no “paradox” within Pearl’s calculus here: it is internally consistent.
I’m using the word paradox in the sense of “puzzling conclusion”, not “logical inconsistency.” Hence “apparent paradox”, which can’t make sense in the context of the latter definition.
It is a bit unfortunate, because while the calculus is elegant in its own terms, it does appears that conceptual analysis is what Pearl was attempting. He really did intend his “do” calculus to reflect how we usually understand counterfactuals, only stated more precisely. Pearl was not consciously proposing a “revisionist” account to the effect: “This is how I’m going to define counterfactuals for the sake of getting some math to work. If your existing definition or intuition about counterfactuals doesn’t match that definition, then sorry, but it still won’t affect my definition.”
The human causal algorithm is frequently, horrifically, wrong. A theory that attempts to model it is, I heavily suspect, less accurate than Pearl’s theory as it stands, at least because it will frequently prefer to use the post hoc inference when it is more appropriate to infer a mutual cause.
Accordingly, it doesn’t help to say “Regular intuitions say one thing, Pearl’s calculus says another, but the calculus is better, therefore the calculus is right and intuitions are wrong”. You can get away with that in revisionist accounts/definitions but not in regular conceptual analysis.
No, I didn’t say that. In my earlier comments I wondered under what conditions the “natural” interpretation of counterfactuals was preferable. If regular intuition disagrees with Pearl, there are at least two possibilities: intuition is wrong (i.e., a bias exists) or Pearl’s calculus does worse than intuition, which means the calculus needs to be updated. In a sense, the calculus is already a “revisionist” account of the human causal learning algorithm, though I disapprove of the connotations of “revisionist” and believe they don’t apply here.
But there is NO causal link in the opposite direction from the Yi to the X, or from any Yi to any Yj. The causal graph is directed, and the structural equations are asymmetric.
Yes, but my question here was whether or not the graph model was accurate. Purely deterministic graph models are weird in that they are observationally equivalent not just with other graphs with the same v-structure, but with any graph with the same skeleton, and even worse, one can always add an arrow connecting the ends of any path. I understand better now that the only purpose behind a deterministic graph model is to fix one out of this vast set of observationally equivalent models. I was confused by the plethora of observationally equivalent deterministic graph models.
Incidentally, if you want my one sentence view, I’d say that Pearl is correctly analysing a certain sort of counterfactual but not the general sort he thinks he is analysing. Consider these two counterfactuals:
If A were to happen, then B would happen.
If A were to be made to happen (by outside intervention) then B would happen.
As far as I can tell, the first is given by P(B | A), and the second is P(B_A). Am I missing something really fundamental here?
I’ve done the calculations for your model, but I’m going to put them in a different comment to separate out mathematical issues from philosophical ones. This comment is already too long.
Couple of points. You say that “the human causal algorithm is frequently, horrifically, wrong”.
But remember here that we are discussing the human counterfactual algorithm, and my understanding of the experimental evidence re counterfactual reasoning (e.g. on cases like Kennedy or Gore) is that it is pretty consistent across human subjects, and between “naive” subjects (taken straight off the street) vs “expert” subjects (who have been thinking seriously about the matter). There is also quite a lot of consistency on what constitues a “plausible” versus a “far out” counterfactual, and much stronger sense about what happens in the cases with plausible antecedents than in cases with weird antecedents (such as what Caesar would have done if fighting in Korea). It’s also interesting that there are rather a lot of formal analyses which almost match the human algorithm, but not quite, and that there is quite a lot of consensus on the counter examples (that they genuinely are counter examples, and that the formal analysis gets it wrong).
What pretty much everyone agrees is that counterfactuals involving macro-variable antecedents assume some back-tracking before the time of the antecedent, and that a small micro-state change to set up the antecedent is more plausible than a sudden macro-change which involves breaks across multiple micro-states.
And on your other point, simple conditioning P(B | A) gives results more like the indicative conditional (“If Oswald did not shoot Kennedy, then someone else did”) rather than the counterfactual conditional (“If Oswald had not shot Kennedy, then no-one else would have”) .
As a (mostly irrelevant) side note, this is Pearl_2001, who is a very different person from Pearl_2012.
I’m using the word paradox in the sense of “puzzling conclusion”, not “logical inconsistency.” Hence “apparent paradox”, which can’t make sense in the context of the latter definition.
The human causal algorithm is frequently, horrifically, wrong. A theory that attempts to model it is, I heavily suspect, less accurate than Pearl’s theory as it stands, at least because it will frequently prefer to use the post hoc inference when it is more appropriate to infer a mutual cause.
No, I didn’t say that. In my earlier comments I wondered under what conditions the “natural” interpretation of counterfactuals was preferable. If regular intuition disagrees with Pearl, there are at least two possibilities: intuition is wrong (i.e., a bias exists) or Pearl’s calculus does worse than intuition, which means the calculus needs to be updated. In a sense, the calculus is already a “revisionist” account of the human causal learning algorithm, though I disapprove of the connotations of “revisionist” and believe they don’t apply here.
Yes, but my question here was whether or not the graph model was accurate. Purely deterministic graph models are weird in that they are observationally equivalent not just with other graphs with the same v-structure, but with any graph with the same skeleton, and even worse, one can always add an arrow connecting the ends of any path. I understand better now that the only purpose behind a deterministic graph model is to fix one out of this vast set of observationally equivalent models. I was confused by the plethora of observationally equivalent deterministic graph models.
As far as I can tell, the first is given by P(B | A), and the second is P(B_A). Am I missing something really fundamental here?
I’ve done the calculations for your model, but I’m going to put them in a different comment to separate out mathematical issues from philosophical ones. This comment is already too long.
Couple of points. You say that “the human causal algorithm is frequently, horrifically, wrong”.
But remember here that we are discussing the human counterfactual algorithm, and my understanding of the experimental evidence re counterfactual reasoning (e.g. on cases like Kennedy or Gore) is that it is pretty consistent across human subjects, and between “naive” subjects (taken straight off the street) vs “expert” subjects (who have been thinking seriously about the matter). There is also quite a lot of consistency on what constitues a “plausible” versus a “far out” counterfactual, and much stronger sense about what happens in the cases with plausible antecedents than in cases with weird antecedents (such as what Caesar would have done if fighting in Korea). It’s also interesting that there are rather a lot of formal analyses which almost match the human algorithm, but not quite, and that there is quite a lot of consensus on the counter examples (that they genuinely are counter examples, and that the formal analysis gets it wrong).
What pretty much everyone agrees is that counterfactuals involving macro-variable antecedents assume some back-tracking before the time of the antecedent, and that a small micro-state change to set up the antecedent is more plausible than a sudden macro-change which involves breaks across multiple micro-states.
And on your other point, simple conditioning P(B | A) gives results more like the indicative conditional (“If Oswald did not shoot Kennedy, then someone else did”) rather than the counterfactual conditional (“If Oswald had not shot Kennedy, then no-one else would have”) .
Granted. I’m a mathematician, not a cognitive scientist.