Causality is Everywhere

Tl;dr: Everything in our world is causally connected in countless ways, which means causal effects are everywhere, and practically anything has systematic effects on almost everything else – it’s just that most of these effects are very close to (but not quite) zero. Studies often fail to measure small effects due to being underpowered, which frequently leads people to believe that some effect “does not exist” (i.e. is 0). I argue that this is a mistake, that almost no effect is truly 0, and that it’s worthwhile to be nuanced about this for a variety of reasons.

In the quest for understanding the world, we often fall into the dichotomy of asking whether X affects Y. Does diet influence mood? Does music affect productivity? Is there a relationship between the stars’ positions at your birth and your personality? The common thread in these questions is our search for causality, mapping the invisible lines that shape reality.

Yet, the pursuit often narrows our vision, framing our questions in a binary of effects versus no effects. This binary is a simplification, perhaps a reasonable one for many practical concerns, but a simplification nonetheless. What if, instead of asking if there is an effect, we assumed there always is one, and made an effort to ask for effect sizes and their sign, rather than mere existence? When asking “is there an effect”, we imply that there’s a significant chance of the answer being “no” – that X simply has no impact on Y whatsoever. But this, I argue, is a mistake. A mistake that hints at a fundamental misconception about the nature of causality in our infinitely interconnected world.

Causality is everywhere. Of course we all know well that correlation is not causation, and most causal effects may indeed be swamped by confounding effects, or by noise, or by any of the countless biases working against science’s effective pursuit of truth seeking. Yet, the fact that we have a hard time seeing causality, doesn’t make it less real. As gwern explains, correlation is everywhere. I argue that the same holds for causation (even if on a much lower scale).

Everything is connected. Not in the esoteric way, but rather in the most physical way possible: as long as two happenings in the universe are within each other’s light cone, they will almost always have some causal connection, and usually quite a number of them. This is particularly true for our everyday human matters, and the types of questions that are typically investigated in scientific studies, including RCTs. And while it’s easy to construct a question where it’s clearly impossible for causation to exist – does the composition of my breakfast today affect my blood pressure yesterday? – nobody seriously asks such questions. In practice we tend to ask the kinds of questions where causal connections are very much plausible. Does protein intake affect longevity? Do school uniforms affect student performance? Do violent video games affect violent tendencies in teenagers? Does music affect productivity? For all of these questions, it would be very surprising if the very dense causal net of reality had no direct or indirect path connecting the two properties. And so I argue that causal connections are present in pretty much any conceivable case, and they will lead to non-zero effects in most metrics we could measure.

The Non-Existence of Effect Size Zero

As soon as there are causal pathways from one thing to another, there will be systematic effects. Some of these pathways may, on average, have a positive effect on the measured outcome, and some may have a negative effect. And they will sum up to some total effect (which may of course differ between people, and even between circumstances for any given person).

Let’s say the question we want to answer is how, if at all, classical music affects productivity. There are clearly certain causal pathways, even if their exact nature is hard to assess. But if a person hears music, this has a number of effects, both on a conscious and subconscious level. It may affect their mood, their heart rate, their physical movements, their thoughts, their attention – and all of these things will have some effect on whatever it is that we take as a measure for productivity. Any given one of these causal pathways may have a tiny effect on average, such as increasing productivity of some population of people by 0.1%, or decreasing it by 0.03%, but real, systematic, non-zero effects nonetheless. And if we sum up all these individual effects, it’s exceedingly unlikely for this sum to end up at exactly 0.0%.

There’s generally a continuum of possible effect sizes. Most effects are close to 0. Depending on how effects are selected, maybe they’re distributed somewhat like a normal distribution, with some effect sizes being positive and some negative:

And while 0 is the mode of this distribution, it’s still just a single point of width 0 on a continuum, meaning the probability of any given effect size being exactly 0, represented by the area of the red line in the picture, is almost 0.

You could of course argue that some tiny effect sizes are maybe not technically 0 but “sufficiently close” to 0 to be considered irrelevant. That we shouldn’t waste our time with some intervention that increases or decreases productivity by something like 0.1%, or in many cases even less than that. Particularly given that we may never be able to reliably measure this kind of effect size with a noisy and vague metric such as “productivity”. Plus, when the average effect size is so small, in many cases this will mean that we can’t make any meaningful predictions for any given individual, because the difference between individuals may be so much larger than the average effect itself.

However, to all of that I would say, where do we draw the line? I agree that a 0.1% change in productivity may not necessarily be worth our attention, given there exist many interventions with much larger effects on productivity, from caffeine intake to hydration to air quality. Also generalization is generally hard. But I still think it’s important to have a nuanced model of the world, and to at least appreciate that such small non-zero effects exist at all, even in cases where we don’t have an easy time identifying them.

Why This Matters

So why even bother with such technicalities? Does it really matter whether some very-close-to-but-not-quite-zero effect exists? I think it does, for a number of reasons.

1. Nuanced Thinking

  • When not appreciating that effects are everywhere, even if tiny, it’s easy to fall into “is there an effect or not” style black and white thinking.

  • Such black and white thinking arguably nudges us closer to a prior belief like “there either is some noteworthy effect, or there is none”, assigning high probabilities to effect sizes being 0.

  • I suspect that this leads to both underestimation of tiny effects (because we assume they are 0) and overestimation of effect sizes when we assume an effect exists.

  • It also can lead to prematurely committing on one effect direction: will classical music increase productivity or not? When asking such questions, it’s easy to forget about the possibility of classical music actually decreasing productivity.

2. Interpreting Study Results

  • In my experience, many people, even those professionally working with a lot of data, tend to misunderstand what statistical significance means, and seem to use the term almost interchangeably with effect size. I’d say that calling an effect significant is already a mistake. The significance is an attribute of the measurement, not of the (supposed) underlying causal effect. So significance gives us a hint at how reliable a measurement (or study result) is, but in itself tells us very little about how large an effect is, since any level of significance can occur with any effect size[1].

  • Relatedly, when a study finds some “insignificant effect” of X on Y, this does not mean that X does not affect Y; X may well cause an increase in Y while the study is merely underpowered[2].

  • Another common pattern is that of “study X finds that A affects B”, followed by a failed replication attempt where “study Y shows that A does not affect B”. I suspect that in many such cases – at least those where the initial finding is not overly surprising and one would intuitively assume that the effect points in the given direction rather than in the opposite direction – an effect does in fact exist and is simply smaller than what was found in the initial study, due to regression to the mean. It definitely makes sense to put less trust in the initial study in such cases, but it would be an overcorrection to conclude that “no effect exists”, based on the failed replication (particularly if the replication also measured an effect pointing the same direction as the initial study, but “merely” failed to achieve statistical significance).

  • I’m not trying to argue that we should be less skeptical of study findings, or that we should take the replication crisis less seriously; rather I’m making the point that skepticism towards (failed) replications is warranted as well to some level, and that the particular “there is no effect” conclusion is almost always wrong. “We’re unsure about the true effect size and hence treat it as if it were 0” might be an appropriate stance, but probably only in cases where we have a strong prior of the effect size really being extremely close to 0, rather than generally whenever a replication attempt fails to reach significance.

3. Nuanced Communication

  • Imagine the following short conversation. Bob: “I believe drinking more water on average improves people’s mood.”—Alice: “I believe it doesn’t”. They certainly seem to have a disagreement. But do they actually? It’s impossible to say without quantification. Maybe Bob means a 0.1 improvement on a 0-10 mood score. Alice would probably be much less quick to disagree about such a claim, than with a purely qualitative one.

  • In fact there’s at least two levels that Alice and Bob may disagree on here, and it’s easy to overlook this distinction: they may disagree about drinking more water actually having any positive effect on mood (i.e. Alice thinks water has a negative or zero effect on mood), or they may simply disagree on the threshold at which a >0 effect actually warrants being pointed out and called an “improvement”.

  • Similarly, when Bob claims “The color of a car does not impact its risk of ending up in an accident”, what exactly does he mean? Is he really convinced that, if you ran an RCT with a billion cars of randomly chosen colors, there would be no statistically significant difference found between the accident rates of red cars vs black cars? Maybe he does mean that (and if he does, I’d be happy to bet at 999:1 odds that he’s wrong). But maybe he just means “the effect size is negligible compared to other, much more relevant factors, such as following the speed limit and having properly adjusted mirrors and headlights”, in which case I’d assume he may be right[3], but it’s a very different claim from what he actually said.

4. Having Good Priors even when there is Little Empirical Evidence

  • If we think about what the most dominant causal pathways between two things are likely to be, we can at times predict whether the effect from one on the other is more likely to be positive or negative. If there are clear reasons to assume that A might increase B, and no clear reasons to assume that A might decrease B, then it’s very likely that A does indeed increase B.

  • I’d also assume it’s possible to improve one’s calibration on the sizes of such effects as long as there are some cases of reliably demonstrated very small effect sizes.

  • There are many domains where we can’t get reliable causal data. Maybe because an intervention is difficult or expensive, or because the measurement itself is difficult to pull off, or because effect sizes are likely so small that it would take extremely large studies to reliably detect them, or because running studies on the thing would be unethical. In all such cases, we should do our best to look at the most likely causal pathways in order to assess in which direction the effect is likely to point.

5. Establishing Helpful Micro Habits

  • If you can find extremely cheap adjustments to your routine or environment which are more likely to have positive than negative effects, they may be worth applying even if there’s little concrete evidence for their efficacy.

  • Many trivial changes might have effect sizes of <1%. These are often impractical to verify in studies because there’s no incentive in place to run huge, expensive studies, in hopes of maybe finding a tiny effect size, but still with a risk of finding nothing of significance.

  • Still, if you can find a bunch of interventions that cost you almost nothing, but which may on average improve some metric you care about by half a percent (such as your happiness, your impact, your productivity, your salary), that can be a pretty good deal, and one we’d be missing out on if we simply doubted the existence of all such effects merely because no study has ever been large enough to show them.

Closing Words

The replication crisis is real, and people /​ websites /​ newspapers still very often err on the side of misinterpreting correlation as causation. It’s undoubtedly a good idea to remain skeptical of any study with outlandish claims and surprising results. Many systematic problems exist both in science and in our individual thinking, which cause us to see patterns where none exist.

With all that said, there’s the flip side: being quick to conclude that some effect does not exist is an overcorrection, and comes at the price of overlooking and misjudging many real effects. If we take for granted that basically everything affects everything else, and it’s always a question of the effect size and direction rather than the effect’s existence, we can achieve a more accurate model of the world, and probably make better predictions about causal effects, even in cases where effects are so small and so hard to measure that we might never find out for sure.

Most effects are tiny, yet a tiny effect applied on a societal level can still affect the lives of many thousands of people. At the same time, stacking a bunch of tiny effects can lead to bigger outcomes. So getting this right is likely worth at least a bit of our attention.

  1. ^

    Of course there’s a dependency between effect size and statistical significance, and power analysis puts them in relation with the sample size required for a study when assuming a certain effect size; I’m just saying here that significance level and effect size are still two separate properties, and it’s important to keep that in mind. Personally, I get more excited about a study showing a huge effect size with p = 0.06 than a tiny effect size with p = 0.04. But many people look purely at significance (in order to answer their “is there an effect or not” question), while effect size remains an afterthought.

  2. ^

    An interesting example of this is this (in?)famous study on mask effectiveness on COVID-19, which was widely cited by “COVID skeptics” (among others) as alleged proof that masks are useless. In fact, the study did however suggest that wearing masks outside might reduce the risk of infection by roughly 15% (95% CI: 46% reduction to 23% increase) – but unsurprisingly the study was way too underpowered to achieve significance for such an effect. Because they a) studied mask wearing outside rather than inside, b) only checked the protection of the wearer themselves rather than protective effect on others, and c) when assessing the power of their study calculated with an expected 50% reduction in COVID infections, which is an almost hilariously large effect for such a specific intervention (which makes me assume that the (insignificant) ~15% effect the study found is likely still greater than the real effect). Despite all that, I’m not even sure if one can blame the study authors; even with their overly generous assumptions they had to run the study with over 6000 participants. In order to measure a, say, 5% effect, which would be much more realistic, the study would have required tens of thousands of participants and much more funding. Still, working with a hypothesis of a 50% reduction in COVID cases through this particular intervention seems almost insane, and kind of unhelpful on a societal level, because of course what people take away from this will be that “wearing masks has no effect”, even if the study actually provided (weak) evidence for mask effectiveness rather than against it.

  3. ^

    While there are some studies on the relation between car color and accident risk, all I’ve seen are observational rather than RCTs, so even when they control for many factors, confounding effects may still dominate the outcome. So while the effect sizes seem pretty large (often in the order of ~10% differences in accident rates between colors), real causal effects are probably much smaller.