XelaP comments on Correlation Does in Fact Imply Causation

XelaP 3 May 2026 10:18 UTC
1 point
0
1: The difference with 1 is that on average, the correlation is still 0. That is, if you have a model where there’s some “true” latent correlation between the Wiener processes, then the observed sample correlation with have expectation of 0. Observing some sample correlation doesn’t necessarily mean that there’s “true” latent correlation, of course.

2: …okay, this threw me for a loop. My first thought say “it’s not correlation that’s relevant here, it’s statistical dependence”… but still, the second distance is not independent of the first, in that knowing how far the moon lets you deduce how long it’s been and thus how far Andromeda is.

My next thought is, essentially, this argument in the Stanford Encyclopedia of Philosophy
When we sample Venetian sea levels over the course of years, we are not drawing probabilistically independent samples from a stable probability distribution. If the sea level is high in a particular year, we can predict that the sea level will be similarly high the following year (they tend not to change dramatically from one year to the next). For this reason, we cannot interpret the relative frequencies that we obtain as estimates of an underlying probability distribution. Thus, even though there is a correlation between V and L in our samples, it is impossible to interpret this as a probabilistic correlation with p(V∩L)>p(V)p(L). Not all correlations in statistical samples bespeak probabilistic correlations
That is, given the information up to a certain date, learning one of the variables’s value in the next time step won’t help you predict the other one, unless there’s a causal connection somewhere.

I haven’t read the rest of that page, so, perhaps there’s something else in there that’d throw me hard enough to change my mind.
3: There’s of course no problem with near-zero or even actually-zero correlation between causally connected variables (likewise if you replace correlation with statistical dependence). Likewise, of course having no direct causal connection is no problem—that’s the whole point.

I wouldn’t say that the causal relations are cyclic. Rather, you have variables for the values at each time, and the causes go in a zigzag back and forth as you go through time.
- Richard_Kennaway 3 May 2026 19:11 UTC
  2 points
  0
  Parent
  The OP asserted that correlation implies causation, and I gave three counterexamples.
  
  That is, if you have a model where there’s some “true” latent correlation between the Wiener processes, then the observed sample correlation with have expectation of 0.
  
  The correlation between independent Wiener processes over some interval is a random variable with substantial spread independent of the size of the interval. Its distribution does have a mean, and that mean is zero, but there is no such thing as the “true” correlation, any more than there is a “true” outcome of an unthrown die. If someone measures the correlation of a sample, they will draw false conclusions if they do not realise the extent to which autocorrelation is destroying their effective sample size.
  
  the second distance is not independent of the first, in that knowing how far the moon lets you deduce how long it’s been and thus how far Andromeda is.
  
  That does not make it a causal connection. If with some future technology we decided to bring the Moon a few metres closer to the Earth, the motions of our galaxy and Andromeda would be unaffected.
  
  In the paragraph from the SEP, I do not understand what distinction it is making between a correlation and a “probabilistic” correlation. It seems to make heavy weather of the idea of successive samples from time series not being independent of each other. Such is the nature of time series. There are tools such as “Granger causality” that will under certain circumstances give reasons for suspecting causation, but they are dependent on discernable time lags between the cause and the caused.
  
  I wouldn’t say that the causal relations are cyclic. Rather, you have variables for the values at each time, and the causes go in a zigzag back and forth as you go through time.
  
  Time is not discrete. (Speculations about the Planck time are not relevant.) There is no sequence of events round a loop, but continuous change of all the variables at once. This is true even for systems with transport lag and integral lag. Even for time series data, the time step is typically determined by the convenience of collecting the data—e.g. daily, weekly, monthly, or annual rainfall—or is a simplification of a more complicated process, e.g. harvest of annual crops.