Correlation Does in Fact Imply Causation

KaseyMarkel16 Feb 2026 21:17 UTC

5 points

Rationality Probability & Statistics Causality

(Cross-posted from my personal website)

Epistemic status: I’ve become slowly convinced of this broad point over the last handful of years, but pretty much every time I mention it in conversation people think it’s wrong. I wrote this in part looking for counterexamples (and to explain the point clearly enough that targeted counterexamples can be identified).

Perhaps the single most-quoted phrase about statistics ¹ is that ‘correlation does not imply causation.’ It’s a phrase I’ve spoken hundreds of times, even after the ideas that resulted in this essay were broadly developed. It’s often a useful educational tool for beginner-level students, and it’s convenient as a shorthand description of a failure of scientific reasoning that’s disturbingly common: just because A correlates with B, it doesn’t mean that A causes B. The classic example is that ice cream sales correlate with violent crime rates, but that doesn’t mean ice cream fuels crime — and of course this is true, and anyone still making base-level errors is well-served by that catchphrase ‘correlation does not imply causation’.

The thing is, our catchphrase is wrong — correlation does in fact imply² causation. More precisely, if things are correlated, there exists a relatively short causal chain linking those things, with confidence one minus the p-value of the correlation. Far too many smart people think the catchphrase is literally true, and end up dismissing correlation as uninteresting. It’s of course possible for things to be correlated by chance, in the same way that it’s possible to flip a coin and get 10 heads in a row³, but as sample size increases this becomes less and less likely, that’s the whole point of calculating the p-value when testing for correlation. In other words, there are only two explanations for a correlation: coincidence or causation.

Let’s return to the ice cream example. It doesn’t take long to guess what’s really going on here: warm weather causes both the increased occupancy of public space and irritability that leads to spikes in violent crime and to a craving for a cold treat. So no, ice cream does not cause violent crime. But they are causally linked, through a quite short causal pathway. There are three possible architectures for the pathway: A causes B, B causes A, and C causes both, either directly or indirectly⁴.

I would hate to push anyone back to the truly naive position that A correlating with B means A causes B, but let’s not say false things: correlation does in fact imply causation⁵, just doesn’t show you which direction that causation flows.

Why do I care about correcting this phrase? Two reasons — it is bad as a community to have catchphrases that are factually false, and “correlation does not imply causation” can and has been used for dark arts before. Rather famously, Ronald Fisher spent decades arguing that there was insufficient evidence to conclude that smoking causes lung cancer—because correlation does not imply causation. The tobacco industry was grateful. Meanwhile, the correlation was telling us exactly what we should have been doing: not dismissing it, but designing experiments to determine which of the three causal architectures explained it. The answer, of course, was the obvious one. Correlation was trying to tell us something, and we spent decades pretending it wasn’t allowed to.

Notes

1.This one strikes closest to my heart as a longtime XKCD fan. Randall is almost gesturing at the point I make in this essay, but not quite. At the risk of thinking too hard about a joke (surely not a sin for this particular comic), the key flaw here is the tiny sample size — this isn’t even correlation, the p-value is 0.5. If 1,000 people take a statistics class and we survey them before and after, then we could get a meaningful, statistically-robust correlation here — and unfortunately it would probably be the case that taking the class makes people more likely to believe this phrase.

2.I’m using ‘imply’ in an empirical rather than logical sense — it’s not that correlation proves causation the way a mathematical proof does, but that it provides evidence for causation, with strength proportional to sample size.

3.p=0.00195, being generous and taking the two-tailed value.

4.That “indirectly” is pointing at a fourth option, actually an infinite set of options: C causes A and D which causes B, C causes A and E which causes D which causes B, etc. I’m not including these because it’s natural to consider those as variants on C causes both. As an analogy: if one pushes over the first domino, did that cause the last domino to fall? A pedant might argue the actual cause of the last domino falling was the penultimate domino falling on it, and in some cases that precision can be useful, but most of the time it’s natural to just say the person who pushed the first domino caused the last one to fall over. In practice the causal chain is probably pretty short, because interesting correlations tend to be well below one, and after a few intermediates the correlation strength drops below the noise threshold of detection.

5.With the evidentiary strength you would expect based on the p-value of the correlation. Coincidence is always a possibility, but becomes pretty unlikely for correlations with a large sample size.

KaseyMarkel16 Feb 2026 21:17 UTC

5 points

15 comments3 min readLW link

Rationality Probability & Statistics Causality

Richard_Kennaway 17 Feb 2026 13:40 UTC
9 points
−1
Some counterexamples:
1. Two independent Wiener processes (Brownian motion) have an expected correlation of zero, but the distribution of correlations is wide. You can easily see correlations of magnitude above 0.8 between two independent realisations of the standard Wiener process, independently of sample size. The problem is the strong autocorrelation of the Wiener process, which drastically broadens the distribution of the correlation coefficient.
2. The Moon is slowly receding from the Earth; the Andromeda galaxy is approaching the Milky Way. Therefore the respective distances between them have a large negative correlation. But there is no causal connection. More generally, any two time series, each of which exhibits a monotonic trend over time, will have a substantial correlation.
3. Control systems often exhibit large correlations between variables with no direct causal conection (and near-zero correlation between variables that do have a direct causal connection). Control systems are ubiquitous in the life sciences, social sciences, and technology. The causal relationships within them are cyclic, putting them outside the scope of Reichenbach’s principle and Pearl-style causal analysis.
(1) and (2) are well-known to people who analyse time series data, and there are standard methods (prewhitening and detrending respectively) for dealing with them. While (3) gets a mention from time to time, none of the papers I have seen on extending causal analysis to cyclic systems have (IMO) made much progress.
- KaseyMarkel 22 Feb 2026 23:20 UTC
  4 points
  0
  Parent
  Thank you for providing counterexamples! Quite useful. and convinces me that the broadest version of this is false. And apologies for the slow reply—the linked papers took a while to absorb (as you may guess, I’m not a mathematician).
  
  1. I tried for a while, but couldn’t properly follow this paper. From reading the abstract, goung through the figures, your summary, and discussing it with Claude (Opus 4.6), I think the core issue here is the same problem as point 2 - the measures are not independent. My wife’s phrasing here is that “there’s a causal link between position at time t and position at time t+1” (in this case + noise, in the case of point 2 plus velocity times time between measurements).
  
  2. Thank you—slightly embarrassed I didn’t find this one myself. I think the issue here is due to repeated measurement—the moon-earth distance now and at t+1s are obviously connected. I think the missing criteria here is independence—repeatedly measuring the same thing to see evolution over time breaks it. This requires some narrowing of the core thesis—my proposed added text is “I’m talking about correlations that survive proper statistical scrutiny—where the p-value is computed correctly for the data structure at hand. A naive Pearson correlation between two autocorrelated time series, or two monotonic trends, can look impressive while meaning nothing, but that’s not a real correlation any more than a loaded die produces a real test of probability. The well-known tools for handling this (differencing, detrending, cointegration tests, correcting for effective sample size) exist precisely because statisticians already understand that autocorrelation inflates apparent correlation. My claim applies to correlations that remain after you’ve done the statistics right.”
  
  I think the claim “correlation of independent measures implies causation” is still interesting and surprising (probably not to this crowd who point out plenty of more rigorous prior work, but to most biologists at least), though a bit less exciting than my original claim. In particular, it’s less exciting because it doesn’t have a bulletproof checklist for “doing the statistics right”, which may or may not be possible to make.
  
  3. This paper I could understand. I’ve worked through all their examples, and don’t think any are counterexamples to my point. Indeed causation can exist without correlation (as you say, very common in biology and technology), I’ve never claimed otherwise. The paper shows some examples where correlation is high despite the causal path containing intermediates with lower correlation which is intriguing, but I believe every correlation in the paper is explained by a causal path that links the two things correlated. As you say, not necessarily a direct causal connection, but still a “relatively short causal chain linking those things”—an indirect causal connection.
  
  My favorite quote from the paper was “the simple maxim that ‘correlation does not imply causation’ having been superceded by methods such as those set out in [9, 14], and in shorter form in [10]”. Good to see that others (Pearl in addition to Reichenbach) have made something like the point I’m aiming to make here, as well as the restriction from time-series data that you brought to my attention in (2) - “cited limit attention to systems whose causal connections form a directed acyclic graph, together with certain further restrictions, and also do not consider dynamical systems or time series data.”
  
  So far I don’t think cyclic systems cause any correlations between causally-unlinked variables, but the fact that people doing this formally haven’t solved it makes me hesitant to make any claims as an outsider to the field.
- Darmani 18 Feb 2026 1:28 UTC
  2 points
  0
  Parent
  Thanks! I’ve been convinced of the general falsity of Reichenbach’s principle.
- XelaP 3 May 2026 10:18 UTC
  1 point
  0
  Parent
  1: The difference with 1 is that on average, the correlation is still 0. That is, if you have a model where there’s some “true” latent correlation between the Wiener processes, then the observed sample correlation with have expectation of 0. Observing some sample correlation doesn’t necessarily mean that there’s “true” latent correlation, of course.
  
  2: …okay, this threw me for a loop. My first thought say “it’s not correlation that’s relevant here, it’s statistical dependence”… but still, the second distance is not independent of the first, in that knowing how far the moon lets you deduce how long it’s been and thus how far Andromeda is.
  
  My next thought is, essentially, this argument in the Stanford Encyclopedia of Philosophy
  When we sample Venetian sea levels over the course of years, we are not drawing probabilistically independent samples from a stable probability distribution. If the sea level is high in a particular year, we can predict that the sea level will be similarly high the following year (they tend not to change dramatically from one year to the next). For this reason, we cannot interpret the relative frequencies that we obtain as estimates of an underlying probability distribution. Thus, even though there is a correlation between V and L in our samples, it is impossible to interpret this as a probabilistic correlation with p(V∩L)>p(V)p(L). Not all correlations in statistical samples bespeak probabilistic correlations
  That is, given the information up to a certain date, learning one of the variables’s value in the next time step won’t help you predict the other one, unless there’s a causal connection somewhere.
  
  I haven’t read the rest of that page, so, perhaps there’s something else in there that’d throw me hard enough to change my mind.
  3: There’s of course no problem with near-zero or even actually-zero correlation between causally connected variables (likewise if you replace correlation with statistical dependence). Likewise, of course having no direct causal connection is no problem—that’s the whole point.
  
  I wouldn’t say that the causal relations are cyclic. Rather, you have variables for the values at each time, and the causes go in a zigzag back and forth as you go through time.
  - Richard_Kennaway 3 May 2026 19:11 UTC
    2 points
    0
    Parent
    The OP asserted that correlation implies causation, and I gave three counterexamples.
    
    That is, if you have a model where there’s some “true” latent correlation between the Wiener processes, then the observed sample correlation with have expectation of 0.
    
    The correlation between independent Wiener processes over some interval is a random variable with substantial spread independent of the size of the interval. Its distribution does have a mean, and that mean is zero, but there is no such thing as the “true” correlation, any more than there is a “true” outcome of an unthrown die. If someone measures the correlation of a sample, they will draw false conclusions if they do not realise the extent to which autocorrelation is destroying their effective sample size.
    
    the second distance is not independent of the first, in that knowing how far the moon lets you deduce how long it’s been and thus how far Andromeda is.
    
    That does not make it a causal connection. If with some future technology we decided to bring the Moon a few metres closer to the Earth, the motions of our galaxy and Andromeda would be unaffected.
    
    In the paragraph from the SEP, I do not understand what distinction it is making between a correlation and a “probabilistic” correlation. It seems to make heavy weather of the idea of successive samples from time series not being independent of each other. Such is the nature of time series. There are tools such as “Granger causality” that will under certain circumstances give reasons for suspecting causation, but they are dependent on discernable time lags between the cause and the caused.
    
    I wouldn’t say that the causal relations are cyclic. Rather, you have variables for the values at each time, and the causes go in a zigzag back and forth as you go through time.
    
    Time is not discrete. (Speculations about the Planck time are not relevant.) There is no sequence of events round a loop, but continuous change of all the variables at once. This is true even for systems with transport lag and integral lag. Even for time series data, the time step is typically determined by the convenience of collecting the data—e.g. daily, weekly, monthly, or annual rainfall—or is a simplification of a more complicated process, e.g. harvest of annual crops.
- TFD 18 Feb 2026 10:43 UTC
  1 point
  0
  Parent
  
  The causal relationships within them are cyclic, putting them outside the scope of Reichenbach’s principle and Pearl-style causal analysis.
  
  When we start introducing time into the mix I think it can be helpful to be somewhat more particular about how we define variables in a causal setting. When you have limited time resolution measurments of a quantity over time you can view it as a single “variable” that has is involved in a causal cycle. But you could also “unroll” this cycle in time and view the quantity at each time as a seperate variable. If you do this it seems to me like Pearl-style causal analysis generally holds. Even if X and Y have a montonic pattern over time with prior time points of each series causing its later time points but no causal interaction between them, the X(t)s and Y(t)s aren’t correlated with each other, and the pattern over time is fully explained by the causal structure. This makes sense in a Pearl-style analysis because you would have an SCM for the series that looks something like X(t) = f(X(t-1, t-2, ….)), which is treating the X(t)s as seperate variables. The correlation over time of X and Y isn’t a correlation between variables in this model, it involves mixing different variables together. If we treat them seperately it seems like the Pearl-style analysis still works and makes no prediction errors, and in fact has the advantage of being robust to potential interventions.
  
  While (3) gets a mention from time to time, none of the papers I have seen on extending causal analysis to cyclic systems have (IMO) made much progress.
  
  Although I won’t claim to understand all the claims and concepts fully, I’ve found this paper to be interesting and helpful in this regard, and to me some of the concepts seem to have connections to the persective I offer above.
  - Darmani 19 Feb 2026 22:25 UTC
    2 points
    0
    Parent
    Pearl-style SCMs assume that every single node in a graph is ontologically independent, which makes unrolled models as suggested not particularly great.
    
    From a paper co-authored by Pearl himself:
    
    The problem with using structural causal models is that the language of structural models is simply not expressive enough to capture certain intricate relationships that are important in causal reasoning. The ontological commitment of these models is to facts alone, assignments of values to random variables, much like propositional logic. Just as propositional logic is not a particularly effective tool for reasoning about dynamic situations, it is similarly difficult to express dynamic situations (with objects, relationships, and time) in terms of structural causal models.
    
    ( https://commonsensereasoning.org/2005/hopkins.pdf )
    
    I haven’t been active in causality research since about 5 years ago, but I’m not aware of any good solutions to the time problem. I do know there are proposals for models that make improvements for causality involving sets of related variables, e.g.: platelet models. I think our own work on counterfactual probabilistic programming has a pretty strong basis, although the philosophy is fairly abridged in the paper.
Darmani 17 Feb 2026 0:59 UTC
6 points
3
tl; dr: The basic version of this is Reichenbach’s principle and is well-known. A lot of holes show up under a more advanced lens.
More precisely, if things are correlated, there exists a relatively short causal chain linking those things, with confidence one minus the p-value of the correlation.
Other than the p-value part (would like to see your reasoning there—p-values are not probabilities, so this reads like a type error to me), this is Reichenbach’s principle ( https://plato.stanford.edu/entries/physics-Rpcc/ ): a correlation between A and B implies that either A causes B, B causes A, or there’s a common cause for both. (Or you are conditioning on something caused by both; c.f. Berkson’s paradox.)
Reichenbach’s principle is trivially true in Pearl-style causal networks, but philosphers argue that it’s false for the causal network of the whole universe. I haven’t found the short versions of the arguments convincing, but you can read about them in the linked PLATO article.

There’s a few things to keep in mind:
- As the size of a causal network grows, the set of correlations grows far faster than the set of causal relationships, until almost all correlations become spurious. While there may be a causal chain in common, I’ll point out that the correlation between the set of nucleotides used by humans and by jellyfish is explained by an extremely long causal chain. Generally, common causes that are very far away lead to very weak correlations, but they can still be strong if each link in the causal chain is strong, as in DNA. So you might want to say “if the correlation is detectable with relatively low precision, then the causal chain linking them is probably short.” But if you are only willing to use low precision to detect correlations, then you run into the approximate unfaithfulness condition below.
- A causal network is faithful if all correlations between variables can be explained by their relation in the causal network. In a deterministic setting such as within a program, causal networks generally are not faithful, so that you can have two variables on opposite sides of the program whose value is correlated for no good reason. In a more general setting, it’s easy to show that non-faithful networks are extremely unlikely under light conditions, but Caroline Uhler showed that approximately unfaithful networks—networks which are close to an unfaithful network—are quite likely .
- KaseyMarkel 19 Feb 2026 16:40 UTC
  1 point
  0
  Parent
  Thanks for a long and detailed comment! tl;dr—I wish I’d kept some of the more detailed footnotes from an earlier draft of the essay.
  the p-value part… type error
  My phrasing there was insufficiently precise. You’re right to call out that (1- p-value) isn’t the probability of coincidence, because that’s ignoring the prior. I should have gone with something vaguer like “with the evidentiary strength you would expect based on the p-value of the correlation”.
  
  I do more or less interpret p-values as probabilities, but in a world of theory rather than the real world—the summary I generally give is “probability of getting a result this extreme or more under the null hypothesis”, does that still seem like a type error?
  The basic version of this is Reichenbach’s principle and is well-known
  I came across this SEP article while researching the essay, thinking surely I couldn’t be the first to think of the idea. Ultimately I cut the section about it because a) I found the SEP article fairly long and confusing and thought I could explain it more simply without referencing it and b) having discussed this point with dozens of people, including many with degrees in math and physics, exactly zero have ever brought up Reichenbach’s common cause, which led me to believe it wasn’t very well known (I’d guess at least 2 OOMs fewer people are familiar with that compared to the classic injunction “correlation does not imply causation”). That said, you’ve convinced me to add the footnote referencing Reichenbach’s common cause back in, thank you.
  As the size of a causal network grows, the set of correlations grows far faster than the set of causal relationships, until almost all correlations become spurious.
  This is another point featured in an earlier version of the essay, inspired by the spurious correlations charts. Correlations between thousands of variables indeed grow extremely quickly, to me this is just an argument for correctly adjusting for multiple comparisons—pretty much all the p-values there are reported as <0.01, but with tens of thousands of comparisons of course you’d find a ton of those by chance and need to adjust for that—my claim is that very few of those would remain interesting after such an adjustment.
  set of nucleotides used by humans and by jellyfish is explained by an extremely long causal chain.
  This is the point about dominoes in footnote 4, I can necker-cube between the two views of it being an extremely long causal chain or a very short one, “they share common ancestry”. I find the shorter view generally easier to reason about and it’s the one I use most of the time.
  https://people.math.ethz.ch/~peterbu/Files/Manuscripts/strong-faithfulness-aos.pdf
  Can you recommend any non-technical summary of or examples of non-faithful networks? The closes semantic match I found after a couple iterations of search was this paper, but it’s denser than I prefer. The core point
  - Darmani 19 Feb 2026 22:42 UTC
    2 points
    0
    Parent
    To dive into the weeds a bit: The phrasing “with the evidentiary strength you would expect based on the p-value of the correlation” works, but the issue with p-values is much stronger than “that’s ignoring the prior.” There’s another type error there. p-values and priors do not mix. A p-value is a supremum over probabilities from a space of hypothesis. Suppose you have two possible null hypothesis: one generates your data with probability 10%, but you assign 0.000000000001% confidence to this hypothesis. The other generates your data with probability 1%, and you assign 99.999999999999% confidence to this hypothesis. Then you want your p-value to be approximately 0.01. But it’s actually 0.1. Sorry. p-values are fundamentally frequentist, not Bayesian, Frequentism rejects the idea that you can express confidence as a probability.
    having discussed this point with dozens of people, including many with degrees in math and physics, exactly zero have ever brought up Reichenbach’s common cause
    Not surprised. The mathematics of causality is still not part of a standard stats curriculum. I can’t fully blame them—I dove deep into this area in my first year of grad school, in 2015, and concluded the field is still fairly primitive. But a disproportionate number of people here are familiar with it. (Indeed, I was excited to dive into causality in grad school when I saw an excuse to do so precisely because of my exposure to the field’s existence through LessWrong.)
    This is the point about dominoes in footnote 4, I can necker-cube between the two views of it being an extremely long causal chain or a very short one, “they share common ancestry”. I find the shorter view generally easier to reason about and it’s the one I use most of the time.
    Indeed, just as I could describe the causal chain linking humans and jellyfish in a few words. I would like there to be a formal way to render both DNA and dominoes as a short causal chain. But I don’t have one.
    Can you recommend any non-technical summary of or examples of non-faithful networks? The closes semantic match I found after a couple iterations of search was this paper, but it’s denser than I prefer. The core point
    Think this got cut off.
    A classic example from Pearl’s 2009 book: A and B are fair ⁰⁄₁ coins, and C is their xor. Then the sets {A, C} and {B, C} each have pairwise independency, even though there are causal links A->C and B->C.
    - KaseyMarkel 22 Feb 2026 21:59 UTC
      1 point
      0
      Parent
      Thanks for a detailed reply!
      p-values and priors do not mix
      
      I guess this is what I get for my statistics-knowledge being an odd mongrel mix of Bayesian vibes gathered from LessWrong and frequentist stats from biology grad school. You seem much more knowledgeable on the formalizations so I trust you’re right that they can’t formally mix, but informally to me it seems like there must be some way to mix them—fundamentally, this is just “extraordinary claims require extraordinary evidence”. To use an example from my day job: if I’m testing whether knocking out a candidate gene that I think will increase tryptophan in a plant tissue does indeed increase the tryptophan, p < 0.05 feels like a fine cutoff. If I’m testing whether that same KO causes my plants to communicate with me telepathically, I’d be crazy to tell anyone about my results unless I was seeing p < 0.001 in at least two independent experiments.
      
      The difference in required p-value threshold to me seems to come down to the prior. Perhaps there’s no formal framework that combines them, but empirically I think that’s what I’m doing.
      The mathematics of causality is still not part of a standard stats curriculum
      .
      You seem to be correct here, but it strikes me as strange because quantifying the evidence for causality was one of the central themes of most of my stats classes (with names like “intro statistics” and “statistical design of experiments”).
      
      Possibly the synthesis here is that most of what non-math people learn doesn’t qualify as real math—I only took the super basic stuff that doesn’t get into the actual math of causality (despite minoring in math in undergrad and taking more math than most in a bio PhD).
      I would like there to be a formal way to render both DNA and dominoes as a short causal chain. But I don’t have one.
      Thinking about it a bit more, I think I’m happy to bite the bullet here and say the causal chain is long (though well-approximated by both of our very short descriptions), and that this is one of the exceptions to the point in footnote 4 that real-world correlations are generally well below 1. The causal chain is quite long, but DNA replication is extremely good—something like 10^-7 errors per base of DNA per generation—pretty dang close to 1 (my guess is domino setups by hobbyists also have failure rates under 10^-3). I’m sure we could find a handful more examples of things with long chains of very high correlation, but not that many—pretty few in biology are over 0.95.
      
      A classic example from Pearl’s 2009 book
      Thanks, that was fun to work out, and appropriately simple for a humble biologist. In the 2x2 matrix of “corr or not” and “causal linkage or not”, this is the opposite square to the one I’m looking for—causal but not correlated. I agree that such things happen (they happen a lot in biology due to regulatory feedback loops), and I now see that this is indeed a non-faithful network.
      
      Is there a similar toy example for “corr but no causal linkage”, e.g. “two variables on opposite sides of the program whose value is correlated for no good reason”? I spent a while asking Claude Opus 4.6 for one and it couldn’t come up with any (it came up with a toy where two variables would always be the same constant, but correlation there is undefined so I don’t count it).
      - Darmani 24 Feb 2026 3:24 UTC
        2 points
        0
        Parent
        To use an example from my day job: if I’m testing whether knocking out a candidate gene that I think will increase tryptophan in a plant tissue does indeed increase the tryptophan, p < 0.05 feels like a fine cutoff. If I’m testing whether that same KO causes my plants to communicate with me telepathically, I’d be crazy to tell anyone about my results unless I was seeing p < 0.001 in at least two independent experiments.
        
        The difference in required p-value threshold to me seems to come down to the prior. Perhaps there’s no formal framework that combines them, but empirically I think that’s what I’m doing.
        As a Pearl-style half-Bayesian (see here) who chose the path of logic over stats more than a decade ago, I am not at all a good representative of the frequentist school. My impression though is that they would say these kinds of biases and assumptions live outside the domain of stats.
        
        A Bayesian would just tell you to use Bayesian techniques such as credible intervals and maximum a posteriori estimators in lieu of p-values, which I have mostly not studied. I believe Bayesian techniques as a whole are conceptually simpler but computationally more difficult than frequentist ones.
        I think I’m happy to bite the bullet here and say the causal chain is long ….and that this is one of the exceptions
        Well spoken!
        Is there a similar toy example for “corr but no causal linkage”, e.g. “two variables on opposite sides of the program whose value is correlated for no good reason”
        So first, if you’re talking about literal correlation coefficients, you can come up with examples where Y is a deterministic function of X, but their correlation coefficient is 0. Correlation coefficients measure linear relationships. Pretty sure you can do it using a piecewise-linear function with just two pieces, and probably also with a simple parabola.
        (Just searched: Yes you can. See page 4 of here .)
        When we speak of correlation, I think it’s more helpful to consider mutual information and conditional mutual information, which capture all dependencies, not merely linear ones. (At least theoretically—convergence properties of conditional mutual information estimation are terrible, and a reason I gave up on my first causality project in grad school.)
        Mutual information between two constants is defined. But it’s always 0. So that’s still a non-example.
        I confess that I actually had to look up how to get spurious correlations in a causal network. Here’s an example showing you can do it even without determinism.
        (Source: https://www.researchgate.net/figure/An-example-of-unfaithful-network-applying-d-separation-rules-to-the-DAG-would-indicate_fig15_392864356 )
        ....and then I realized this is still the opposite of what you’re looking for. Oops. I should have just gone to sleep.
        Okay, you might be right about the spurious correlations not being possible in Pearl-style causal graphs.
        What I recall from my investigations into this in 2015: (1) I was mostly concerned with trying to apply the PC algorithm for causal structure discovery to programs. PC essentially operates by looking for colliders, i.e.: independent variables that become dependent when conditioned on a mutual child. So spurious non-correlations were a bigger problem for me. (2) I found an old comment of mine where I mentioned having to condition on events of probabiilty 0 to apply the do-calculus.
XelaP 3 May 2026 9:57 UTC
1 point
0
You might be interested in gwern’s writing on the subject. In a large world, most pairs of variables are correlated at least weakly, so the null hypothesis is always false and a big enough study should get arbitrarily low p values.
TFD 18 Feb 2026 10:50 UTC
1 point
0

I’m using ‘imply’ in an empirical rather than logical sense

I feel like this is using the word somewhat differently than is meant by the phrase you are discussing. I’ve always interpreted “correlation doesn’t imply causation” to mean that if X and Y are correlated you can’t necessarily say that X → Y or Y → X (probably based on some prior about the direction like timing), not that correlation is somehow completely unrelated to causation.
- KaseyMarkel 19 Feb 2026 16:48 UTC
  1 point
  0
  Parent
  Congrats on always interpreting the phrase that way! Of the folks I’ve discussed this with in person, every one of them had the interpretation that things can be correlated without either one causing the other or a common cause.
  
  I would be surprised if “correlation does’t imply causation” would have become so popular if most people interpreted it as strictly expanding the options from X → Y or Y → X to include common cause (and now might ask a stats-professor friend to run a poll to see which of these interpretations most people hold in their intro stats class), but I think your interpretation is correct. I certainly don’t want to attack a strawman version of the phrase, but if >30% of people interpret it that way I’ll conclude it’s not a strawman.