Two independent Wiener processes (Brownian motion) have an expected correlation of zero, but the distribution of correlations is wide. You can easily see correlations of magnitude above 0.8 between two independent realisations of the standard Wiener process, independently of sample size. The problem is the strong autocorrelation of the Wiener process, which drastically broadens the distribution of the correlation coefficient.
The Moon is slowly receding from the Earth; the Andromeda galaxy is approaching the Milky Way. Therefore the respective distances between them have a large negative correlation. But there is no causal connection. More generally, any two time series, each of which exhibits a monotonic trend over time, will have a substantial correlation.
Control systems often exhibit large correlations between variables with no direct causal conection (and near-zero correlation between variables that do have a direct causal connection). Control systems are ubiquitous in the life sciences, social sciences, and technology. The causal relationships within them are cyclic, putting them outside the scope of Reichenbach’s principle and Pearl-style causal analysis.
(1) and (2) are well-known to people who analyse time series data, and there are standard methods (prewhitening and detrending respectively) for dealing with them. While (3) gets a mention from time to time, none of the papers I have seen on extending causal analysis to cyclic systems have (IMO) made much progress.
Thank you for providing counterexamples! Quite useful. and convinces me that the broadest version of this is false. And apologies for the slow reply—the linked papers took a while to absorb (as you may guess, I’m not a mathematician).
1. I tried for a while, but couldn’t properly follow this paper. From reading the abstract, goung through the figures, your summary, and discussing it with Claude (Opus 4.6), I think the core issue here is the same problem as point 2 - the measures are not independent. My wife’s phrasing here is that “there’s a causal link between position at time t and position at time t+1” (in this case + noise, in the case of point 2 plus velocity times time between measurements).
2. Thank you—slightly embarrassed I didn’t find this one myself. I think the issue here is due to repeated measurement—the moon-earth distance now and at t+1s are obviously connected. I think the missing criteria here is independence—repeatedly measuring the same thing to see evolution over time breaks it. This requires some narrowing of the core thesis—my proposed added text is “I’m talking about correlations that survive proper statistical scrutiny—where the p-value is computed correctly for the data structure at hand. A naive Pearson correlation between two autocorrelated time series, or two monotonic trends, can look impressive while meaning nothing, but that’s not a real correlation any more than a loaded die produces a real test of probability. The well-known tools for handling this (differencing, detrending, cointegration tests, correcting for effective sample size) exist precisely because statisticians already understand that autocorrelation inflates apparent correlation. My claim applies to correlations that remain after you’ve done the statistics right.”
I think the claim “correlation of independent measures implies causation” is still interesting and surprising (probably not to this crowd who point out plenty of more rigorous prior work, but to most biologists at least), though a bit less exciting than my original claim. In particular, it’s less exciting because it doesn’t have a bulletproof checklist for “doing the statistics right”, which may or may not be possible to make.
3. This paper I could understand. I’ve worked through all their examples, and don’t think any are counterexamples to my point. Indeed causation can exist without correlation (as you say, very common in biology and technology), I’ve never claimed otherwise. The paper shows some examples where correlation is high despite the causal path containing intermediates with lower correlation which is intriguing, but I believe every correlation in the paper is explained by a causal path that links the two things correlated. As you say, not necessarily a direct causal connection, but still a “relatively short causal chain linking those things”—an indirect causal connection.
My favorite quote from the paper was “the simple maxim that ‘correlation does not imply causation’ having been superceded by methods such as those set out in [9, 14], and in shorter form in [10]”. Good to see that others (Pearl in addition to Reichenbach) have made something like the point I’m aiming to make here, as well as the restriction from time-series data that you brought to my attention in (2) - “cited limit attention to systems whose causal connections form a directed acyclic graph, together with certain further restrictions, and also do not consider dynamical systems or time series data.”
So far I don’t think cyclic systems cause any correlations between causally-unlinked variables, but the fact that people doing this formally haven’t solved it makes me hesitant to make any claims as an outsider to the field.
The causal relationships within them are cyclic, putting them outside the scope of Reichenbach’s principle and Pearl-style causal analysis.
When we start introducing time into the mix I think it can be helpful to be somewhat more particular about how we define variables in a causal setting. When you have limited time resolution measurments of a quantity over time you can view it as a single “variable” that has is involved in a causal cycle. But you could also “unroll” this cycle in time and view the quantity at each time as a seperate variable. If you do this it seems to me like Pearl-style causal analysis generally holds. Even if X and Y have a montonic pattern over time with prior time points of each series causing its later time points but no causal interaction between them, the X(t)s and Y(t)s aren’t correlated with each other, and the pattern over time is fully explained by the causal structure. This makes sense in a Pearl-style analysis because you would have an SCM for the series that looks something like X(t) = f(X(t-1, t-2, ….)), which is treating the X(t)s as seperate variables. The correlation over time of X and Y isn’t a correlation between variables in this model, it involves mixing different variables together. If we treat them seperately it seems like the Pearl-style analysis still works and makes no prediction errors, and in fact has the advantage of being robust to potential interventions.
While (3) gets a mention from time to time, none of the papers I have seen on extending causal analysis to cyclic systems have (IMO) made much progress.
Although I won’t claim to understand all the claims and concepts fully, I’ve found this paper to be interesting and helpful in this regard, and to me some of the concepts seem to have connections to the persective I offer above.
Pearl-style SCMs assume that every single node in a graph is ontologically independent, which makes unrolled models as suggested not particularly great.
From a paper co-authored by Pearl himself:
The problem with using structural causal models is that the language of structural models is simply not expressive enough to capture certain intricate relationships that are important in causal reasoning. The ontological commitment of these models is to facts alone, assignments of values to random variables, much like propositional logic. Just as propositional logic is not a particularly effective tool for reasoning about dynamic situations, it is similarly difficult to express dynamic situations (with objects, relationships, and time) in terms of structural causal models.
I haven’t been active in causality research since about 5 years ago, but I’m not aware of any good solutions to the time problem. I do know there are proposals for models that make improvements for causality involving sets of related variables, e.g.: platelet models. I think our own work on counterfactual probabilistic programming has a pretty strong basis, although the philosophy is fairly abridged in the paper.
Some counterexamples:
Two independent Wiener processes (Brownian motion) have an expected correlation of zero, but the distribution of correlations is wide. You can easily see correlations of magnitude above 0.8 between two independent realisations of the standard Wiener process, independently of sample size. The problem is the strong autocorrelation of the Wiener process, which drastically broadens the distribution of the correlation coefficient.
The Moon is slowly receding from the Earth; the Andromeda galaxy is approaching the Milky Way. Therefore the respective distances between them have a large negative correlation. But there is no causal connection. More generally, any two time series, each of which exhibits a monotonic trend over time, will have a substantial correlation.
Control systems often exhibit large correlations between variables with no direct causal conection (and near-zero correlation between variables that do have a direct causal connection). Control systems are ubiquitous in the life sciences, social sciences, and technology. The causal relationships within them are cyclic, putting them outside the scope of Reichenbach’s principle and Pearl-style causal analysis.
(1) and (2) are well-known to people who analyse time series data, and there are standard methods (prewhitening and detrending respectively) for dealing with them. While (3) gets a mention from time to time, none of the papers I have seen on extending causal analysis to cyclic systems have (IMO) made much progress.
Thank you for providing counterexamples! Quite useful. and convinces me that the broadest version of this is false. And apologies for the slow reply—the linked papers took a while to absorb (as you may guess, I’m not a mathematician).
1. I tried for a while, but couldn’t properly follow this paper. From reading the abstract, goung through the figures, your summary, and discussing it with Claude (Opus 4.6), I think the core issue here is the same problem as point 2 - the measures are not independent. My wife’s phrasing here is that “there’s a causal link between position at time t and position at time t+1” (in this case + noise, in the case of point 2 plus velocity times time between measurements).
2. Thank you—slightly embarrassed I didn’t find this one myself. I think the issue here is due to repeated measurement—the moon-earth distance now and at t+1s are obviously connected. I think the missing criteria here is independence—repeatedly measuring the same thing to see evolution over time breaks it. This requires some narrowing of the core thesis—my proposed added text is “I’m talking about correlations that survive proper statistical scrutiny—where the p-value is computed correctly for the data structure at hand. A naive Pearson correlation between two autocorrelated time series, or two monotonic trends, can look impressive while meaning nothing, but that’s not a real correlation any more than a loaded die produces a real test of probability. The well-known tools for handling this (differencing, detrending, cointegration tests, correcting for effective sample size) exist precisely because statisticians already understand that autocorrelation inflates apparent correlation. My claim applies to correlations that remain after you’ve done the statistics right.”
I think the claim “correlation of independent measures implies causation” is still interesting and surprising (probably not to this crowd who point out plenty of more rigorous prior work, but to most biologists at least), though a bit less exciting than my original claim. In particular, it’s less exciting because it doesn’t have a bulletproof checklist for “doing the statistics right”, which may or may not be possible to make.
3. This paper I could understand. I’ve worked through all their examples, and don’t think any are counterexamples to my point. Indeed causation can exist without correlation (as you say, very common in biology and technology), I’ve never claimed otherwise. The paper shows some examples where correlation is high despite the causal path containing intermediates with lower correlation which is intriguing, but I believe every correlation in the paper is explained by a causal path that links the two things correlated. As you say, not necessarily a direct causal connection, but still a “relatively short causal chain linking those things”—an indirect causal connection.
My favorite quote from the paper was “the simple maxim that ‘correlation does not imply causation’ having been superceded by methods such as those set out in [9, 14], and in shorter form in [10]”. Good to see that others (Pearl in addition to Reichenbach) have made something like the point I’m aiming to make here, as well as the restriction from time-series data that you brought to my attention in (2) - “cited limit attention to systems whose causal connections form a directed acyclic graph, together with certain further restrictions, and also do not consider dynamical systems or time series data.”
So far I don’t think cyclic systems cause any correlations between causally-unlinked variables, but the fact that people doing this formally haven’t solved it makes me hesitant to make any claims as an outsider to the field.
Thanks! I’ve been convinced of the general falsity of Reichenbach’s principle.
When we start introducing time into the mix I think it can be helpful to be somewhat more particular about how we define variables in a causal setting. When you have limited time resolution measurments of a quantity over time you can view it as a single “variable” that has is involved in a causal cycle. But you could also “unroll” this cycle in time and view the quantity at each time as a seperate variable. If you do this it seems to me like Pearl-style causal analysis generally holds. Even if X and Y have a montonic pattern over time with prior time points of each series causing its later time points but no causal interaction between them, the X(t)s and Y(t)s aren’t correlated with each other, and the pattern over time is fully explained by the causal structure. This makes sense in a Pearl-style analysis because you would have an SCM for the series that looks something like X(t) = f(X(t-1, t-2, ….)), which is treating the X(t)s as seperate variables. The correlation over time of X and Y isn’t a correlation between variables in this model, it involves mixing different variables together. If we treat them seperately it seems like the Pearl-style analysis still works and makes no prediction errors, and in fact has the advantage of being robust to potential interventions.
Although I won’t claim to understand all the claims and concepts fully, I’ve found this paper to be interesting and helpful in this regard, and to me some of the concepts seem to have connections to the persective I offer above.
Pearl-style SCMs assume that every single node in a graph is ontologically independent, which makes unrolled models as suggested not particularly great.
From a paper co-authored by Pearl himself:
( https://commonsensereasoning.org/2005/hopkins.pdf )
I haven’t been active in causality research since about 5 years ago, but I’m not aware of any good solutions to the time problem. I do know there are proposals for models that make improvements for causality involving sets of related variables, e.g.: platelet models. I think our own work on counterfactual probabilistic programming has a pretty strong basis, although the philosophy is fairly abridged in the paper.