Book review: The Book of Why, by Judea Pearl and Dana MacKenzie.
This book aims to turn the ideas from Pearl’s seminal
Causality) into
something that’s readable by a fairly wide audience.
It is somewhat successful. Most of the book is pretty readable, but
parts of it still read like they were written for mathematicians.
History of science
A fair amount of the book covers the era (most of the 20th century) when
statisticians and scientists mostly rejected causality as an appropriate
subject for science. They mostly observed correlations, and carefully
repeated the mantra “correlation does not imply causation”.
Scientists kept wanting to at least hint at causal implications of their
research, but statisticians rejected most attempts to make rigorous
claims about causes.
The one exception was for randomized controlled trials (RCTs).
Statisticians figured out early on that a good RCT can demonstrate that
correlation does imply causation. So RCTs became increasingly
important
over much of the 20th century[1].
That created a weird tension, where the use of RCTs made it clear that
scientists valued the concept of causality, but in most other contexts
they tried to talk as if causality wasn’t real. Not quite as definitely
unreal as phlogiston. A bit closer to how
behaviorists often tabooed the
ideas that we had internal experiences and consciousness, or how
linguists once banned debates on the origin of
language, namely,
that it was dangerous to think science could touch those topics. Or
maybe a bit like heaven and hell—concepts which, even if they are
useful, seem to be forever beyond the reach of science?
But scientists kept wanting to influence the world, rather than just
predict it. So they often got impatient, when they couldn’t afford to
wait for RCTs, to act as if correlations told them something about
causation.
The most conspicuous example is smoking. Scientists saw many hints that
smoking caused cancer, but without an RCT[2], their
standards and vocabulary made it hard to say more than that smoking is
associated with cancer.
This eventually prompted experts to articulate
criteria that
seemed somewhat useful at establishing causality. But even in ideal
circumstances, those criteria weren’t convincing enough to produce a
consensus. Authoritative claims about smoking and cancer were delayed
for years by scientists’ discomfort with talking about
causality[3].
It took Pearl to describe how to formulate a unambiguous set of causal
claims, and then say rigorous things about whether the evidence confirms
or discredits the claims.
What went wrong?
The book presents some good hints about why the concept of causality was
tabooed from science for much of the 20th century.
It focuses on the role of R.A.
Fisher (also known as one
of the main advocates of
frequentism).
Fisher was a zealot whose prestige was somewhat heavily based on his
skill at quantifying uncertainty. In contrast, he didn’t manage to
quantify causality, or even figure out how to talk clearly about it.
Pearl hints that this biased him against causal reasoning.
path analysis requires scientific thinking, as does every exercise in
causal inference. Statistics, as frequently practiced, discourages it,
and encouraged “canned” procedures instead.
But blaming a few influential people seems to merely describe the tip of
the iceberg. Why did scientists as a group follow Fisher’s lead?
I suggest that the iceberg is better explained by what James C. Scott
describes
as high modernism and
the desire for legibility.
I see a similar same pattern in the 20th century dominance of
frequentism in most fields of science and the rejection of Bayesian
approaches. Anything that required priors (whose source often couldn’t
be rigorously measured) was at odds with the goal of legibility.
The rise and fall of the taboo on causal inference coincide moderately
well with the rise and fall of Soviet-style central planning, planned
cities, and
Taylorist factory
management.
I also see some overlap with
behaviorism, with its
attempt to deny the importance of variables that were hard to measure,
and its utopian hopes for
how much its techniques could accomplish.
These patterns all seem to all be rooted in overconfident extrapolations
of simple models of what caused progress. I don’t think it’s an accident
that they all peaked near the middle of the 20th century, and were
mostly discredited by the end of the century.
I remember that when I was young, I supported the standard inferences
from the “correlation does not imply causation” mantra, and was briefly
(and less clearly) tempted by the other manifestations of high
modernism. Alas, I don’t remember my reasons for doing so well enough to
be of much use, other than a semi-appropriate respect for the
authorities who were promoting those ideas.
An example of why causal reasoning matters
Here’s an example that the book provides, dealing with non-randomized
studies of a fictitious drug (to illustrate Simpson’s
Paradox, but also to
show the difference between statistics and causal inference). The
studies quantify three variables in each study:
Study 1: drug ← gender → heart attacks
Study 2: drug → blood pressure → heart attacks
The book asks how we know we should treat the middle variables in those
studies differently. The examples come with identical numbers, so that a
statistics program which only sees correlations, and can’t understand
the causal arrows I’ve drawn here, would analyze both studies using the
same methods. The numbers in these studies are chosen so that the
aggregate data suggest an opposite conclusion about the drug from what
we see if we stratify by gender or blood pressure. Standard statistics
won’t tell us which way of looking at data is more informative. But if
we apply a little extra knowledge, it becomes clear that gender was a
confounding variable that should be controlled for (it influenced who
decided to take the drug), whereas blood pressure was a mediator that
tells us how the drug works, and shouldn’t be controlled for.
People typically don’t find it hard to distinguish between the
hypothesis that a drug caused a change in blood pressure and the
hypothesis that a drug changed patients’ reported gender. We all have a
sufficiently sophisticated model of the world to assume the drug isn’t
changing patients’ gender identity (i.e. we know that if that assumption
were unexpectedly false, we’d hear about it).
Yet canned programs today are not designed to handle that, and it will
be hard to fix programs so that they have the common sense needed to
make those distinctions over a wide variety of domains.
Continuing Problems?
Pearl complains about scientists controlling for too many variables. The
example described above helps explain why controlling for variables is
often harmful, when it’s not informed by a decent causal model. I have
been mildly suspicious of the controlling for more variables is better
attitude
in the past, but this book clarified the problems well enough that I
should be able to distinguish sensible from foolish attempts at
controlling for variables.
Controlling for confounders seems like an area where science still has a
long way to go before it can live up to Pearl’s ideals.
There’s also some lingering high modernism affecting the status of RCTs
relative to other ways of inferring causality.
A sufficiently well-run RCT can at least create the appearance that
everything important has been quantified. Sampling errors can be
reliably quantified. Then the experimenter can sweep any systemic bias
under the rug, and declare that the hypothesis formation step lies
outside of science, or maybe deny that hypotheses matter (maybe they’re
just looking through all the evidence to see what pops out).
It looks to me like the peer review process still focuses too heavily on
the easy-to-quantify and easy-to-verify steps in the scientific process
(i.e. p-values). When RCTs aren’t done, researchers too often focus on
risk factors and
associations, to equivocate about whether the research enlightens us
about causality.
AI
The book points out that an AI will need to reason causally in order to
reach human-level intelligence. It seems like that ought to be
uncontroversial. I’m unsure whether it actually is uncontroversial.
But Pearl goes further, saying that the lack of causal reasoning in AIs
has been “perhaps the biggest roadblock” to human-level intelligence.
I find that somewhat implausible. My intuition is that general-purpose
causal inference won’t be valuable in AIs until those AIs have
world-models which are at least as sophisticated as
crows[4], and that when that level is reached, we’ll get
rapid progress at incorporating causal inference into AI.
It’s true that AI research often focuses on data mining (blind
empiricism
/ model-free approaches), at the expense of approaches that could
include causal inference. High modernist attitudes may well have hurt AI
research in the past, and that may still be slowing AI research a bit.
But Pearl exaggerates these effects.
To the extent that Pearl identifies tasks that AI can’t yet tackle (e.g.
“What kinds of solar systems are likely to harbor Earth-like planets?”),
they need not just causal reasoning, but also the ability to integrate
knowledge from a wide variety of data sources—and that means learning
a much wider variety of concepts in a single system than AI researchers
currently have the power to handle.
I expect that mainstream machine learning is mostly on track to handle
that variety of concepts any decade now. I expect that until then, AI
will only be able to do causal reasoning on toy problems, regardless of
how well it understands causality.
Conclusion
Pearl is great at evaluating what constitutes clear thinking about
causality. He’s somewhat good at teaching us how to think clearly about
novel causal problems, and rather unremarkable when he ventures outside
the realm of causal inference.
Footnotes
[1] - RCTs (and p-values) don’t seem to be popular in physics or
geology. I’m curious why Pearl doesn’t find this worth noting. I’ve
mentioned
before that people seem to care about statistical significance mainly where
powerful interest groups might benefit from false conclusions.
[2] - The book claims that an RCT for smoking “would be neither
feasible nor ethical”. Clarke’s first
law applies here:
it looks like about 8
studies
had some sort of randomized interventions which altered smoking rates,
including twostudies
focused solely on smoking interventions, which generated important
reductions in smoking in the control group.
The RCTs seem to confirm that smoking causes health problems such as
lung cancer and cardiovascular disease, but suggest that smoking
shortens lifespan by a good deal less than the correlations would
indicate.
[3] - As footnote 2 suggests, there have been some legitimate
puzzles about the effects of smoking. Those sources of uncertainty have
been obscured by the people who signal support for the “smoking is evil”
view, and by smokers and tobacco companies who cling to delusions.
Smokers probably have some unhealthy habits and/or genes that contribute
to cancer via causal pathways other than smoking.
The book notes that there is a “smoking
gene”
(rs16969968, aka Mr Big), but mostly it just means that smoking causes
more harm for people with that gene.
Yet the book mostly implies that the anti-smoking crusaders were at
least 90% right about the effects of smoking, when I think the reality
is more complicated.
Pearl thinks quite rigorously when he’s focused exclusively on causal
inference, but outside that domain of expertise, he comes across as no
more careful than an average scientist.
[4] - Pearl would have us believe that causal reasoning is mostly
a recent human invention (in the last 50,000 years). I find Wikipedia’s
description
of non-human causal reasoning to be more credible.
Book review: Pearl’s Book of Why
Link post
Book review: The Book of Why, by Judea Pearl and Dana MacKenzie.
This book aims to turn the ideas from Pearl’s seminal Causality) into something that’s readable by a fairly wide audience.
It is somewhat successful. Most of the book is pretty readable, but parts of it still read like they were written for mathematicians.
History of science
A fair amount of the book covers the era (most of the 20th century) when statisticians and scientists mostly rejected causality as an appropriate subject for science. They mostly observed correlations, and carefully repeated the mantra “correlation does not imply causation”.
Scientists kept wanting to at least hint at causal implications of their research, but statisticians rejected most attempts to make rigorous claims about causes.
The one exception was for randomized controlled trials (RCTs). Statisticians figured out early on that a good RCT can demonstrate that correlation does imply causation. So RCTs became increasingly important over much of the 20th century[1].
That created a weird tension, where the use of RCTs made it clear that scientists valued the concept of causality, but in most other contexts they tried to talk as if causality wasn’t real. Not quite as definitely unreal as phlogiston. A bit closer to how behaviorists often tabooed the ideas that we had internal experiences and consciousness, or how linguists once banned debates on the origin of language, namely, that it was dangerous to think science could touch those topics. Or maybe a bit like heaven and hell—concepts which, even if they are useful, seem to be forever beyond the reach of science?
But scientists kept wanting to influence the world, rather than just predict it. So they often got impatient, when they couldn’t afford to wait for RCTs, to act as if correlations told them something about causation.
The most conspicuous example is smoking. Scientists saw many hints that smoking caused cancer, but without an RCT[2], their standards and vocabulary made it hard to say more than that smoking is associated with cancer.
This eventually prompted experts to articulate criteria that seemed somewhat useful at establishing causality. But even in ideal circumstances, those criteria weren’t convincing enough to produce a consensus. Authoritative claims about smoking and cancer were delayed for years by scientists’ discomfort with talking about causality[3].
It took Pearl to describe how to formulate a unambiguous set of causal claims, and then say rigorous things about whether the evidence confirms or discredits the claims.
What went wrong?
The book presents some good hints about why the concept of causality was tabooed from science for much of the 20th century.
It focuses on the role of R.A. Fisher (also known as one of the main advocates of frequentism). Fisher was a zealot whose prestige was somewhat heavily based on his skill at quantifying uncertainty. In contrast, he didn’t manage to quantify causality, or even figure out how to talk clearly about it. Pearl hints that this biased him against causal reasoning.
But blaming a few influential people seems to merely describe the tip of the iceberg. Why did scientists as a group follow Fisher’s lead?
I suggest that the iceberg is better explained by what James C. Scott describes as high modernism and the desire for legibility.
I see a similar same pattern in the 20th century dominance of frequentism in most fields of science and the rejection of Bayesian approaches. Anything that required priors (whose source often couldn’t be rigorously measured) was at odds with the goal of legibility.
The rise and fall of the taboo on causal inference coincide moderately well with the rise and fall of Soviet-style central planning, planned cities, and Taylorist factory management.
I also see some overlap with behaviorism, with its attempt to deny the importance of variables that were hard to measure, and its utopian hopes for how much its techniques could accomplish.
These patterns all seem to all be rooted in overconfident extrapolations of simple models of what caused progress. I don’t think it’s an accident that they all peaked near the middle of the 20th century, and were mostly discredited by the end of the century.
I remember that when I was young, I supported the standard inferences from the “correlation does not imply causation” mantra, and was briefly (and less clearly) tempted by the other manifestations of high modernism. Alas, I don’t remember my reasons for doing so well enough to be of much use, other than a semi-appropriate respect for the authorities who were promoting those ideas.
An example of why causal reasoning matters
Here’s an example that the book provides, dealing with non-randomized studies of a fictitious drug (to illustrate Simpson’s Paradox, but also to show the difference between statistics and causal inference). The studies quantify three variables in each study:
Study 1: drug ← gender → heart attacks
Study 2: drug → blood pressure → heart attacks
The book asks how we know we should treat the middle variables in those studies differently. The examples come with identical numbers, so that a statistics program which only sees correlations, and can’t understand the causal arrows I’ve drawn here, would analyze both studies using the same methods. The numbers in these studies are chosen so that the aggregate data suggest an opposite conclusion about the drug from what we see if we stratify by gender or blood pressure. Standard statistics won’t tell us which way of looking at data is more informative. But if we apply a little extra knowledge, it becomes clear that gender was a confounding variable that should be controlled for (it influenced who decided to take the drug), whereas blood pressure was a mediator that tells us how the drug works, and shouldn’t be controlled for.
People typically don’t find it hard to distinguish between the hypothesis that a drug caused a change in blood pressure and the hypothesis that a drug changed patients’ reported gender. We all have a sufficiently sophisticated model of the world to assume the drug isn’t changing patients’ gender identity (i.e. we know that if that assumption were unexpectedly false, we’d hear about it).
Yet canned programs today are not designed to handle that, and it will be hard to fix programs so that they have the common sense needed to make those distinctions over a wide variety of domains.
Continuing Problems?
Pearl complains about scientists controlling for too many variables. The example described above helps explain why controlling for variables is often harmful, when it’s not informed by a decent causal model. I have been mildly suspicious of the controlling for more variables is better attitude in the past, but this book clarified the problems well enough that I should be able to distinguish sensible from foolish attempts at controlling for variables.
Controlling for confounders seems like an area where science still has a long way to go before it can live up to Pearl’s ideals.
There’s also some lingering high modernism affecting the status of RCTs relative to other ways of inferring causality.
A sufficiently well-run RCT can at least create the appearance that everything important has been quantified. Sampling errors can be reliably quantified. Then the experimenter can sweep any systemic bias under the rug, and declare that the hypothesis formation step lies outside of science, or maybe deny that hypotheses matter (maybe they’re just looking through all the evidence to see what pops out).
It looks to me like the peer review process still focuses too heavily on the easy-to-quantify and easy-to-verify steps in the scientific process (i.e. p-values). When RCTs aren’t done, researchers too often focus on risk factors and associations, to equivocate about whether the research enlightens us about causality.
AI
The book points out that an AI will need to reason causally in order to reach human-level intelligence. It seems like that ought to be uncontroversial. I’m unsure whether it actually is uncontroversial.
But Pearl goes further, saying that the lack of causal reasoning in AIs has been “perhaps the biggest roadblock” to human-level intelligence.
I find that somewhat implausible. My intuition is that general-purpose causal inference won’t be valuable in AIs until those AIs have world-models which are at least as sophisticated as crows[4], and that when that level is reached, we’ll get rapid progress at incorporating causal inference into AI.
It’s true that AI research often focuses on data mining (blind empiricism / model-free approaches), at the expense of approaches that could include causal inference. High modernist attitudes may well have hurt AI research in the past, and that may still be slowing AI research a bit. But Pearl exaggerates these effects.
To the extent that Pearl identifies tasks that AI can’t yet tackle (e.g. “What kinds of solar systems are likely to harbor Earth-like planets?”), they need not just causal reasoning, but also the ability to integrate knowledge from a wide variety of data sources—and that means learning a much wider variety of concepts in a single system than AI researchers currently have the power to handle.
I expect that mainstream machine learning is mostly on track to handle that variety of concepts any decade now. I expect that until then, AI will only be able to do causal reasoning on toy problems, regardless of how well it understands causality.
Conclusion
Pearl is great at evaluating what constitutes clear thinking about causality. He’s somewhat good at teaching us how to think clearly about novel causal problems, and rather unremarkable when he ventures outside the realm of causal inference.
Footnotes
[1] - RCTs (and p-values) don’t seem to be popular in physics or geology. I’m curious why Pearl doesn’t find this worth noting. I’ve mentioned before that people seem to care about statistical significance mainly where powerful interest groups might benefit from false conclusions.
[2] - The book claims that an RCT for smoking “would be neither feasible nor ethical”. Clarke’s first law applies here: it looks like about 8 studies had some sort of randomized interventions which altered smoking rates, including two studies focused solely on smoking interventions, which generated important reductions in smoking in the control group.
The RCTs seem to confirm that smoking causes health problems such as lung cancer and cardiovascular disease, but suggest that smoking shortens lifespan by a good deal less than the correlations would indicate.
[3] - As footnote 2 suggests, there have been some legitimate puzzles about the effects of smoking. Those sources of uncertainty have been obscured by the people who signal support for the “smoking is evil” view, and by smokers and tobacco companies who cling to delusions.
Smokers probably have some unhealthy habits and/or genes that contribute to cancer via causal pathways other than smoking.
The book notes that there is a “smoking gene” (rs16969968, aka Mr Big), but mostly it just means that smoking causes more harm for people with that gene.
Yet the book mostly implies that the anti-smoking crusaders were at least 90% right about the effects of smoking, when I think the reality is more complicated.
Pearl thinks quite rigorously when he’s focused exclusively on causal inference, but outside that domain of expertise, he comes across as no more careful than an average scientist.
[4] - Pearl would have us believe that causal reasoning is mostly a recent human invention (in the last 50,000 years). I find Wikipedia’s description of non-human causal reasoning to be more credible.