After reading this post I was stunned. Now I think the central conclusion is wrong, though I still think it is a great post, and I will go back to being stunned if you convince me the conclusion is correct.
You’ve shown how to identify the correct graph structure from the data. But you’ve erred in assuming that the directed edges of the graph imply causality.
Imagine you did the same analysis, except instead of using O=”overweight” you use W=”wears size 44 or higher pants”. The data would look almost the same. So you would reach an analogous conclusion: that wearing large pants causes one not to exercise. This seems obviously false unless your notion of causality is very different from mine.
In general, I think the following principle holds: inferring causality requires an intervention; it cannot be discovered from observational data alone. A researcher who hypothesized that W causes not-E could round up a bunch of people, have half of them wear big pants, observe the effect of this intervention on exercise rates, and then conclude that there is no causal effect.
You are correct—directed edges do not imply causality by means of only conditional independence tests. You need something called the faithfulness assumption, and additional (causal) assumptions, that Eliezer glossed over. Without causal assumptions and with only faithfulness, all you are recovering is the structure of a statistical, rather than a causal model. Without faithfulness, conditional independence tests do not imply anything. This is a subtle issue, actually.
There is no magic—you do not get causality without causal assumptions.
Is this another variation of the theme that one needs to assume the possibility of inductive reasoning to make an argument for it (or also assume Occam’s Razor to argue for it)? Also, the specific example he gave seems to me like an instance of “given very skewed data, the best guesses are still wrong” (there was sometime a variation of that here, regarding bets and opponents who have superior information). Or are you thinking of something for subtle?
Even if you assume that we can do induction (and assume faithfulness!), conditional independence tests simply do not select among causal models. They select among statistical models, because conditional independences are properties of joint distributions (statistical, rather than causal objects). Linking those joint distributions with something causal relies on causal assumptions.
I think the biggest lesson to learn from Pearl’s book is to keep statistical and causal notions separate.
Or there might be some hidden third factor, a gene which causes both fat and non-exercise. By Occam’s Razor this is more complicated and its probability is penalized accordingly, but we can’t actually rule it out. It is obviously impossible to do the converse experiment where half the subjects are randomly assigned lower weights, since there’s no known intervention which can cause weight loss.
The model assumes that those are the only relevant variables. Given that assumption, we can prove that weight causes exercise. And that it can’t be the other way around.
If there are unobserved variables, it’s possible that they can cause weight and cause exercise. However that wasn’t one of the hypotheses anyone believed beforehand; they were arguing whether weight causes exercise or if exercise causes weight.
Second, even if there is an unobserved variable, it still suggests that exercising more will not improve your weight. Otherwise internet use would correlate with weight. Because internet use affects exercise. If exercise affected weight at all, then internet use would indirectly cause weight gain, and therefore correlate with it.
The whole point of the article is about this trick. Where taking a weird and unrelated variable like internet use, lets us discover the direction of causation. Which according to common knowledge about statistics, shouldn’t be possible. Not without randomized controlled experiments.
I’m sorry if I’m just being too much of a dodo to perceive the mystery, but your scenario seems easily accounted for. You can use a Bayesian network to infer causality if and only if you have valid data to fill it with. Of course wearing large pants does not cause one not to exercise, but no real set of data would indicate that it did. Am I missing something?
EDIT: shortly after writing this, I read up on faithfulness and Milton Friedman’s thermostat, so the “if and only if” part of my comment isn’t quite accurate. Still, the pants size scenario doesn’t seem like one of these exceptional cases.
In this case, the true structure would be O->E, O->W, I->E. If O is unobserved, then you confuse a fork for an arrow, but I’m not sure you can actually get an arrow pointing the wrong way just by omitting variables.
After reading this post I was stunned. Now I think the central conclusion is wrong, though I still think it is a great post, and I will go back to being stunned if you convince me the conclusion is correct.
You’ve shown how to identify the correct graph structure from the data. But you’ve erred in assuming that the directed edges of the graph imply causality.
Imagine you did the same analysis, except instead of using O=”overweight” you use W=”wears size 44 or higher pants”. The data would look almost the same. So you would reach an analogous conclusion: that wearing large pants causes one not to exercise. This seems obviously false unless your notion of causality is very different from mine.
In general, I think the following principle holds: inferring causality requires an intervention; it cannot be discovered from observational data alone. A researcher who hypothesized that W causes not-E could round up a bunch of people, have half of them wear big pants, observe the effect of this intervention on exercise rates, and then conclude that there is no causal effect.
You are correct—directed edges do not imply causality by means of only conditional independence tests. You need something called the faithfulness assumption, and additional (causal) assumptions, that Eliezer glossed over. Without causal assumptions and with only faithfulness, all you are recovering is the structure of a statistical, rather than a causal model. Without faithfulness, conditional independence tests do not imply anything. This is a subtle issue, actually.
There is no magic—you do not get causality without causal assumptions.
Is this another variation of the theme that one needs to assume the possibility of inductive reasoning to make an argument for it (or also assume Occam’s Razor to argue for it)? Also, the specific example he gave seems to me like an instance of “given very skewed data, the best guesses are still wrong” (there was sometime a variation of that here, regarding bets and opponents who have superior information). Or are you thinking of something for subtle?
Even if you assume that we can do induction (and assume faithfulness!), conditional independence tests simply do not select among causal models. They select among statistical models, because conditional independences are properties of joint distributions (statistical, rather than causal objects). Linking those joint distributions with something causal relies on causal assumptions.
I think the biggest lesson to learn from Pearl’s book is to keep statistical and causal notions separate.
Thanks for clarifying!
He addressed that in the third footnote.
The model assumes that those are the only relevant variables. Given that assumption, we can prove that weight causes exercise. And that it can’t be the other way around.
If there are unobserved variables, it’s possible that they can cause weight and cause exercise. However that wasn’t one of the hypotheses anyone believed beforehand; they were arguing whether weight causes exercise or if exercise causes weight.
Second, even if there is an unobserved variable, it still suggests that exercising more will not improve your weight. Otherwise internet use would correlate with weight. Because internet use affects exercise. If exercise affected weight at all, then internet use would indirectly cause weight gain, and therefore correlate with it.
The whole point of the article is about this trick. Where taking a weird and unrelated variable like internet use, lets us discover the direction of causation. Which according to common knowledge about statistics, shouldn’t be possible. Not without randomized controlled experiments.
I’m sorry if I’m just being too much of a dodo to perceive the mystery, but your scenario seems easily accounted for. You can use a Bayesian network to infer causality if and only if you have valid data to fill it with. Of course wearing large pants does not cause one not to exercise, but no real set of data would indicate that it did. Am I missing something?
EDIT: shortly after writing this, I read up on faithfulness and Milton Friedman’s thermostat, so the “if and only if” part of my comment isn’t quite accurate. Still, the pants size scenario doesn’t seem like one of these exceptional cases.
In this case, the true structure would be O->E, O->W, I->E. If O is unobserved, then you confuse a fork for an arrow, but I’m not sure you can actually get an arrow pointing the wrong way just by omitting variables.