When is correlation transitive?

It’s a well-known property of correlation that it’s not transitive in general. If are three real-valued random variables such that and , it doesn’t have to be the case that .

Nevertheless, there are some circumstances under which correlation is transitive. I will focus on two such cases in this post.

Primer: correlation as an inner product

For what follows, some background knowledge is necessary that we can regard correlations of real-valued random variables with finite second moments as inner products in an appropriate Hilbert space.

Specifically, if are two such random variables with zero mean and unit standard deviation, which is a simplification we can always make as correlation is invariant under translation and scalar multiplication, then we can compute

The pairing defines an inner product on the space of random variables with finite second moments where two random variables are considered equivalent if they are equal with probability (almost surely). The properties that we expect out of an inner product are easy to check: the pairing is obviously bilinear and positive definite.

Furthermore, it turns out this inner product turns the space of random variables with finite second moments into a Hilbert space: the vector space turns out to be complete under the induced norm . Roughly speaking, this means that we can take orthogonal projections onto closed subspaces with impunity.

Now that we have this framework, we can move on to the main results of this post.

Correlation is transitive when the correlations are sufficiently strong

I’ll first prove the following:

Claim 1: If and , then

Moreover, these bounds are tight: for any , there is a combination for which we can make either the right or the left inequality into an equality.

Proof

We can assume have mean zero and unit variance without loss of generality. Taking orthogonal projections of onto the one-dimensional subspace spanned by , we can write

where and the random variables have mean zero and variance . Taking inner products gives

Using the Cauchy-Schwarz inequality for our inner product finishes the proof: . For the existence proof, let be an arbitrary random variable with mean zero and unit variance and pick to be perfectly correlated or perfectly anti-correlated standard Gaussians that are uncorrelated with .

Interpretation

When are large and positive, the lower bound is also positive, and so we have a guaranteed positive correlation between and .

One way to simplify this is to make it single-dimensional by assuming . In this case, the lower bound is . If we want a guaranteed positive correlation between and , this means the correlations have to satisfy .

This condition is quite strict, and we might wonder if some transitivity of correlation can be recovered in the absence of such strong correlations between and . It turns out the answer is yes, at least if we assume the random variables are in some sense “generic”.

Correlation is transitive on the average

It turns out that in a suitable sense, when and are positively correlated, there is a tendency for to also be positively correlated, even though per Claim 1 we can’t deduce that they must be positively correlated.

The precise version of this claim is as follows:

Claim 2: Let be vectors independently and uniformly distributed on the -dimensional unit sphere , and let be two real numbers. Denote the standard inner product on by . Then, we have

In intuitive terms, this claim is saying that if are “generic” random variables such that and , we should on average expect that .

The connection of this geometric claim to correlations is straightforward once we know that correlations can be interpreted as inner products. Indeed, if we consider real-valued random variables on a finite probability space with elements, then we can identify the space of random variables with , the space of random variables of mean zero with and the space of random variables with mean zero and of unit variance with the unit sphere . The appropriate notion of considering a “generic” or “average” case here is to make our setup rotationally invariant, and that’s exactly what the uniform distribution on the unit sphere does.

Now that we understand the connection to correlations, let’s see how to prove this claim.

Proof

As with the proof of the first claim, we take the familiar orthogonal projections

and write the inner product we care about as

Now, let’s think about what the joint distribution of and is like. These vectors could be anything orthogonal to , and the vectors orthogonal to are located on an “equator” or “great circle” of which is isomorphic to . As the measure we’re taking expectations over on is rotationally invariant, we notice that the individual distributions of must be uniform on the great circle. Furthermore, as we can rotate and independently without affecting the conditioning, we also deduce that their distributions are independent conditional on .

Once we know all of this, it’s clear that we must have because of reflection symmetry: for any pair with positive inner product, we can simply reflect across the origin while staying on the great circle orthogonal to and find a pair with an inner product having equal magnitude but opposite sign, so the expectation must vanish.

Then, defining for ease of notation, Adam’s law finishes the proof:

Interpretation

The above statement is essentially a formalization of the Bayesian claim that “when are positively correlated and are positively correlated, it’s likely that are positively correlated as well”. So while the properties for correlation to be strictly transitive are rather stringent as laid out in Claim 1, a weaker notion of “transitivity in expectation” holds for arbitrarily weak positive correlations, as is strictly positive whenever both are.

However, the statement is also a negative result in the sense that it implies a quadratic decay in the strength of the correlation we should expect. If and are otherwise generic random variables, we should only expect . Because of this second-order scaling, relationships which are quite strong may become weak in expectation once we go through a “transitivity link”. For instance, a correlation of is considered quite strong in the social sciences where everything is noisy, but a correlation of is not.

Why write this post?

Correlations come up in many contexts, but most people don’t have very good intuition for how correlation behaves in general. I see arguments of the form “A is correlated with B and B is correlated with C, so A should correlate with C” or “it’s not surprising that A would correlate with C” quite often. However, it’s important to have quantitative intuitions for what to expect here. For example, if and , that suggests are more strongly correlated than we would expect just on the basis of the connection between them that goes through : this would be some evidence for a latent factor that all three variables have some kind of loading on.

I hope that this post helps clear up some confusion about the properties of correlations involving three or more variables and what we can and can’t expect from correlation in such contexts.