The urge to infer causation from correlation must be powerful. We can easily spot errors of unwarranted causal inferences, apparently from overtraining the recognition of certain patterns, but as soon as the same caveat is expressed in a novel way, we have to work to apply the principle to novelties of form. Simpson’s Paradox seems not just the bearer of the message that you shouldn’t make automatic causal inferences from mere correlation; it is an explanation of why that inference is invalid.. A blind correlation 1) doesn’t screen out confounds, and 2) might screen out the causal factor.
It seems that we’ve learned part 1 well, but the complete explanation for the possibility that correlations hide causes includes part 2. It seems part 2 is harder. While we’ve all learned to spot instances of part 1, we still founder on part 2. We’re inclined to think partitioning the data can’t make the situation epistemically worse, but it can by screening out the wrong variable, that is, the causal variable.
So in the real life example, we don’t find it so counter-intuitive that data about the success rates of men and women fail to prove discrimination when you don’t control for the confounds. But we do stumble when it goes the other way. If we had the data that women do better than men for the competitive petitions as well as the easy positions, we continue to find it hard to see that this doesn’t prove that women overall don’t do better than men.
The urge to infer causation from correlation must be powerful. We can easily spot errors of unwarranted causal inferences, apparently from overtraining the recognition of certain patterns, but as soon as the same caveat is expressed in a novel way, we have to work to apply the principle to novelties of form. Simpson’s Paradox seems not just the bearer of the message that you shouldn’t make automatic causal inferences from mere correlation; it is an explanation of why that inference is invalid.. A blind correlation 1) doesn’t screen out confounds, and 2) might screen out the causal factor.
It seems that we’ve learned part 1 well, but the complete explanation for the possibility that correlations hide causes includes part 2. It seems part 2 is harder. While we’ve all learned to spot instances of part 1, we still founder on part 2. We’re inclined to think partitioning the data can’t make the situation epistemically worse, but it can by screening out the wrong variable, that is, the causal variable.
So in the real life example, we don’t find it so counter-intuitive that data about the success rates of men and women fail to prove discrimination when you don’t control for the confounds. But we do stumble when it goes the other way. If we had the data that women do better than men for the competitive petitions as well as the easy positions, we continue to find it hard to see that this doesn’t prove that women overall don’t do better than men.