I didn’t find that clear from your article. A correlation between X and Y tells you no more than that causality is present somewhere. It tells you absolutely nothing about whether X causes Y, Y causes X, Z causes X and Y, how long the causal chains are, or whether it’s a sampling artefact due to common effects of X and Y.
Or exhaustive. Imperfect sampling can produce sample correlations among variables with no causal connection. (Toy example: X and Y are independent, Z is jointly caused by X and Y and is equal to X+Y, and everyone is unwittingly sampling from a subpopulation with a narrow range of values of Z. Sample X and Y will have a high negative correlation.)
A real one? Not off hand, not being a statistician, but sampling bias is a standard problem that has to be guarded against in statistical investigations. It can affect not just the sample means of variables, but correlations and indeed every statistic whatsoever.
To flesh out the toy example with an imaginary narrative, suppose X = intelligence, Y = effort, and Z = exam grade. Suppose Z is highly correlated with X+Y. If we divide the population up by exam grade, we may find that in every subpopulation, X and Y are negatively correlated, even while in the whole population, X and Y are uncorrelated.
I’m mostly interested in whether X causes Y vs. whether some Z causes both X and Y.
I didn’t find that clear from your article. A correlation between X and Y tells you no more than that causality is present somewhere. It tells you absolutely nothing about whether X causes Y, Y causes X, Z causes X and Y, how long the causal chains are, or whether it’s a sampling artefact due to common effects of X and Y.
Those options aren’t mutually exclusive...
Or exhaustive. Imperfect sampling can produce sample correlations among variables with no causal connection. (Toy example: X and Y are independent, Z is jointly caused by X and Y and is equal to X+Y, and everyone is unwittingly sampling from a subpopulation with a narrow range of values of Z. Sample X and Y will have a high negative correlation.)
Could you give a concrete example of such sampling bias?
A real one? Not off hand, not being a statistician, but sampling bias is a standard problem that has to be guarded against in statistical investigations. It can affect not just the sample means of variables, but correlations and indeed every statistic whatsoever.
To flesh out the toy example with an imaginary narrative, suppose X = intelligence, Y = effort, and Z = exam grade. Suppose Z is highly correlated with X+Y. If we divide the population up by exam grade, we may find that in every subpopulation, X and Y are negatively correlated, even while in the whole population, X and Y are uncorrelated.