Causality: a chapter by chapter review

This is a chap­ter by chap­ter re­view of Causal­ity (2nd ed.) by Judea Pearl (UCLA, blog). Like my pre­vi­ous re­view, the in­ten­tion is not to sum­ma­rize but to help read­ers de­ter­mine whether or not they should read the book (and if they do, what parts to read). Read­ing the re­view is in no way a sub­sti­tute for read­ing the book.

I’ll state my ba­sic im­pres­sion of the book up front, with de­tailed com­ments af­ter the chap­ter dis­cus­sions: this book is mon­u­men­tally im­por­tant to any­one in­ter­ested in procur­ing knowl­edge (es­pe­cially causal knowl­edge) from statis­ti­cal data, but it is a heav­ily tech­ni­cal book pri­mar­ily suit­able for ex­perts. The math­e­mat­ics in­volved is not par­tic­u­larly difficult, but its pre­sen­ta­tion re­quires ded­i­cated read­ing and clar­ity of thought. Only the epi­logue, this lec­ture, is suit­able for the gen­eral au­di­ence, and that will be the high­est value por­tion for most read­ers of LW.

1. In­tro­duc­tion to Prob­a­bil­ities, Graphs, and Causal Models

While the de­scrip­tions are com­plete, this chap­ter may be more use­ful as a re­fresher than as an in­tro­duc­tion. The three sec­tions are de­tailed in in­verse pro­por­tion to the ex­pected reader’s fa­mil­iar­ity.

For the reader who’s seen prob­a­bil­ity calcu­lus be­fore, Pearl’s de­scrip­tion of it in 12 pages is short, sweet, and com­plete. For the reader that hasn’t seen it, that’s just enough space to list the defi­ni­tions and give a few ex­am­ples. Com­pare Eliezer’s ex­pla­na­tion of Bayes’ Rule (al­most 50 pages) to Pearl’s (around 2).

The sec­tion on graphs moves a lit­tle less quickly, but even so don’t be afraid to find an on­line tu­to­rial on d-sep­a­ra­tion if Pearl’s ex­pla­na­tion is too fast. For some rea­son, he does not men­tion here that sec­tion 11.1.2 (p 335-337 in my copy) is a gen­tler in­tro­duc­tion to d-sep­a­ra­tion. [edit] His blog also linked to this pre­sen­ta­tion, which is an even gen­tler in­tro­duc­tion to graphs, causal net­works, and d-sep­a­ra­tion.

The sec­tion on causal mod­els is the most de­tailed, as it will be new to most read­ers, and closely fol­lows the sec­tion on graphs. Pearl uses an ex­am­ple to demon­strate the use of coun­ter­fac­tu­als, which is a po­tent first glance at the use­ful­ness of causal mod­els.

He also draws an im­por­tant dis­tinc­tion be­tween prob­a­bil­is­tic, statis­ti­cal, and causal pa­ram­e­ters. Prob­a­bil­is­tic pa­ram­e­ters are quan­tities defined in terms of a joint dis­tri­bu­tion. Statis­ti­cal pa­ram­e­ters are quan­tities defined in terms of ob­served vari­ables drawn from a joint dis­tri­bu­tion. Causal pa­ram­e­ters are quan­tities defined in terms of a causal model, and are not statis­ti­cal. (I’ll leave the ex­pla­na­tion of the full im­pli­ca­tions of the dis­tinc­tions to the chap­ter.)

2. A The­ory of In­ferred Causation

Philoso­phers have long grap­pled with the challenge of iden­ti­fy­ing causal in­for­ma­tion from data, es­pe­cially non-ex­per­i­men­tal data. This chap­ter de­tails an al­gorithm to at­tack that prob­lem.

The key con­cep­tual leap is the use of a third vari­able in the model as a con­trol. Sup­pose X and Y are cor­re­lated; if there is a third vari­able Z that is cor­re­lated with Y but not with X, the nat­u­ral in­ter­pre­ta­tion is that X and Z both cause Y. That is not the unique in­ter­pre­ta­tion, which causes quite a bit of philo­soph­i­cal trou­ble, which Pearl ad­dresses with sta­bil­ity. Only one of the mul­ti­ple con­sis­tent in­ter­pre­ta­tions is sta­ble.

Pearl gives the ex­am­ple of a photo of a chair. There are two pri­mary hy­pothe­ses: first, that the un­der­ly­ing sce­nario was a sin­gle chair, and sec­ond, that the un­der­ly­ing sce­nario was two chairs, placed so that the first chair hides the sec­ond. While both sce­nar­ios pre­dict the ob­served data, the first sce­nario is not just sim­pler, but more sta­ble. If the cam­era po­si­tion moved slightly, the sec­ond chair might not be hid­den any­more- and so we should ex­pect two chairs to be visi­ble in most pho­tos of two chair sce­nar­ios.

Pearl also calls on Oc­cam’s Ra­zor, in the form of prefer­ring can­di­date mod­els and dis­tri­bu­tions which can­not be overfit to those which can be overfit. With those two rea­son­able crite­ria, we can move from an in­finite set of pos­si­ble causal mod­els that could ex­plain the data to a sin­gle equiv­alency class of causal mod­els which most fru­gally ex­plain the data.

The chap­ter de­scribes the al­gorithm, its func­tion­al­ity, and some im­ple­men­ta­tion de­tails, which I won’t dis­cuss here.

Pearl also dis­cusses how to differ­en­ti­ate be­tween po­ten­tial causes, gen­uine causes, and spu­ri­ous as­so­ci­a­tions given the out­put of the causal in­fer­ence al­gorithm.

The chap­ter con­cludes with some philo­soph­i­cal dis­cus­sion of the in­fluence of time and vari­able choice, as well as defend­ing the three core as­sump­tions (of min­i­mal­ity, the Marko­vian struc­ture of causal mod­els, and sta­bil­ity).

3. Causal Di­a­grams and the Iden­ti­fi­ca­tion of Causal Effects

While we can in­fer causal re­la­tion­ships from data, that task is far eas­ier when we al­low our­selves to as­sume some sen­si­ble causal re­la­tion­ships. This step is nec­es­sary, and de­sir­able even though mak­ing it trans­par­ent is some­times con­tro­ver­sial.

In­ter­est­ing situ­a­tions are of­ten very com­plex, and Pearl shows how causal graphs- even ones where not all nodes may be mea­sured- make it pos­si­ble to nav­i­gate the com­plex­ity of those situ­a­tions. The chap­ter fo­cuses pri­mar­ily on iden­ti­fi­a­bil­ity- that is, if we fix X to be some value x, can we de­ter­mine p(y|do(x))? For an ar­bi­trar­ily large and com­plex graph, the an­swer is non-ob­vi­ous.

The an­swer is non-ob­vi­ous enough that there is mas­sive con­tro­versy be­tween statis­ti­ci­ans and econo­me­tri­ci­ans, which Pearl at­tempted to de­fuse (and de­scribes how well that went at the end of the chap­ter), be­cause there is a sub­tle differ­ence be­tween ob­serv­ing that X=x and set­ting X=x. If we see that the price of corn is $1 a bushel, that im­plies a very differ­ent world than one where we set the price of corn at $1 a bushel. In ap­pli­ca­tions where we want to con­trol a sys­tem, we’re in­ter­ested in the sec­ond- but nor­mal up­dat­ing based on Bayes’ Rule will give us the first. That is, from a statis­ti­cal per­spec­tive, we can always de­ter­mine the joint prob­a­bil­ity dis­tri­bu­tion, and con­di­tion on X=x to get p(y|X=x); but from a causal per­spec­tive this gen­er­ally won’t give us the in­for­ma­tion we want. Differ­ent causal mod­els can have the same joint prob­a­bil­ity dis­tri­bu­tion- and thus look statis­ti­cally in­dis­t­in­guish­able- but give very differ­ent re­sults when X is fixed to a par­tic­u­lar value.

When ev­ery­thing is ob­serv­able (and thus de­ter­minis­tic), there’s no challenge in figur­ing out what will hap­pen when X is fixed to a par­tic­u­lar value. When there is un­cer­tainty- that is, only some vari­ables are ob­serv­able- then we need to de­ter­mine if we know enough to still be able to de­ter­mine the effects of fix­ing X.

His ‘in­ter­ven­tion calcu­lus’ de­scribes how to fix a vari­able by mod­ify­ing the graph, and then what you can get out of the new, mod­ified graph. It takes a nec­es­sary de­tour through the im­pacts of con­found­ing vari­ables (or, more pre­cisely, what graph struc­ture that rep­re­sents and how to de­ter­mine iden­ti­fi­a­bil­ity in the light of that graph struc­ture). This is what lets us de­scribe and calcu­late p(y|do(x)).

I should com­ment I feel badly re­view­ing a tech­ni­cal book like this; my sum­mary of forty pages of math is half a page long, be­cause I leave all of the math to the book it­self, and just de­scribe the mo­ti­va­tion for the math.

4. Ac­tions, Plans, and Direct Effects

This chap­ter be­gins with a dis­tinc­tion be­tween acts and ac­tions, very similar to the dis­tinc­tion dis­cussed in the pre­vi­ous chap­ter. He treats acts as events or re­ac­tions to stim­uli; be­cause they are caused by the en­vi­ron­ment, they give ev­i­dence about the en­vi­ron­ment. Ac­tions are treated as de­liber­a­tive- they can’t be used as ev­i­dence be­cause they haven’t hap­pened yet, and are the re­sult of de­liber­a­tion. They be­come acts once performed- they’re ac­tions from the in­side, but acts from the out­side. (Link to al­gorithm feels like on the in­side.) Pearl de­scribes the con­tro­versy over New­comb’s Prob­lem as a con­fu­sion over the dis­tinc­tion be­tween acts and ac­tions. Ev­i­den­tial De­ci­sion The­ory, of­ten called EDT, is dis­cussed and dis­missed; be­cause it doesn’t re­spect this dis­tinc­tion (or, re­ally, any causal in­for­ma­tion), it gives non­sen­si­cal re­sults. Com­muters shouldn’t rush to work, be­cause if they did, that would in­crease the prob­a­bil­ity that they’ve over­slept.

Pearl gives a brief de­scrip­tion of the re­la­tion­ship be­tween in­fluence di­a­grams, used in de­ci­sion anal­y­sis, and the causal di­a­grams he de­scribes here; ba­si­cally, they’re very similar, al­though the ID liter­a­ture pur­pose­fully sidesteps causal im­pli­ca­tions which are at the fore­front here.

Much of the chap­ter is spent de­scribing the math that de­ter­mines when an ac­tion’s or plan’s effects are iden­ti­fi­able.

Of par­tic­u­lar in­ter­est is the sec­tion on di­rect effects, which walks through the fa­mous Berkeley Ad­mis­sions ex­am­ple of Simp­son’s Para­dox. Pearl pre­sents a mod­ified ver­sion in which the school ad­mits stu­dents solely based on qual­ifi­ca­tions, but ap­pears to dis­crim­i­nate on a de­part­ment-by-de­part­ment ba­sis, to demon­strate the ne­ces­sity of us­ing a full causal model, rather than sim­ple ad­just­ing.

5. Causal­ity and Struc­tural Models in So­cial Science and Economics

This chap­ter will be much more sig­nifi­cant to read­ers with ex­pe­rience do­ing eco­nomic or so­cial sci­ence mod­el­ing, but is still worth­while to other read­ers as a demon­stra­tion of the power of causal graphs as a lan­guage.

The part of the chap­ter that is in­ter­est­ing out­side of the con­text of struc­tural mod­els is the part that dis­cusses test­ing of mod­els. Every miss­ing link in a causal graph is the strong pre­dic­tion that those two vari­ables are in­de­pen­dent (if prop­erly con­di­tioned). This pre­sents a ready test of a causal graph- com­pute the co­var­i­ance for ev­ery miss­ing link (af­ter proper con­di­tion­ing), and con­firm that those links are not nec­es­sary. As a statis­ti­cal prac­tice, this sig­nifi­cantly aids in the de­bug­ging of mod­els be­cause it makes lo­cal er­rors ob­vi­ous, even when they might be ob­scured in global er­ror tests.

That said, I found it mildly dis­con­cert­ing that Pearl did not men­tion there the ra­tio­nale for us­ing global tests. That is, if there are twenty miss­ing links in your causal di­a­gram, and you col­lect real data and calcu­late co­var­i­ances, on av­er­age you should ex­pect the co­var­i­ance of one miss­ing link to be statis­ti­cally sig­nifi­cantly differ­ent from zero if you’re us­ing a lo­cal test for each link in­de­pen­dently. A global test will look at one statis­ti­cally sig­nifi­cant red flag and ig­nore it as ex­pected given the num­ber of co­effi­cients.

In the con­text of struc­tural mod­els, most of the in­ter­est­ing parts of the chap­ter deal with de­ter­min­ing the iden­ti­fi­a­bil­ity of pa­ram­e­ters in the struc­tural mod­els, and then how to in­ter­pret those pa­ram­e­ters. Pearl’s ap­proach is clear, eas­ily un­der­stand­able, and soundly su­pe­rior to al­ter­na­tives that he quotes (pri­mar­ily to demon­strate his su­pe­ri­or­ity to them).

6. Simp­son’s Para­dox, Con­found­ing, and Collapsibility

This chap­ter be­gins by dis­solv­ing Simp­son’s Para­dox, which is more pre­cisely called a re­ver­sal effect. Pearl gives a sim­ple ex­am­ple: sup­pose 80 sub­jects have a dis­ease and take a drug to treat it. 50% (20) of those who take the drug re­cover, and 40% (16) of those who do not take the drug re­cover. By it­self, this seems to sug­gest that the drug in­creases the re­cov­ery rate.

The effect is re­versed, though, when you take gen­der into ac­count. Of the men, 30 de­cided to take the drug- and only 60% (18) of them re­cov­ered, com­pared to 70% (7) of the 10 that de­cided to not take the drug. Of the women, 10 de­cided to take the drug- and only 20% (2) of them re­cov­ered, com­pared to 30% (9) of the 30 who did not de­cide to take the drug.

Depict­ing the is­sue causally, the effect is clear: sex im­pacts both the pro­por­tion of sub­jects who take the drug and the base re­cov­ery rate, and the pos­i­tive im­pact of sex on re­cov­ery is mask­ing the nega­tive im­pact of the drug on re­cov­ery. Sim­ply calcu­lat­ing p(re­cov­ery|drug)-p(re­cov­ery|~drug) does not tell us if the drug is helpful. The pa­ram­e­ter we need for that is p(re­cov­ery|do(drug))-p(re­cov­ery|do(~drug)). With causal di­a­grams and a clear con­cep­tual differ­ence be­tween ob­serv­ing and fix­ing events, that’s not a mis­take one would make, and so there’s no para­dox to avoid and noth­ing in­ter­est­ing to see.

The rest of the chap­ter dis­cusses con­found­ing, pre­sent­ing a defi­ni­tion of sta­ble no-con­found­ing be­tween vari­ables and show­ing why other defi­ni­tions are less use­ful or rigor­ous. For read­ers who haven’t heard of those al­ter­na­tives be­fore, the com­par­i­sons will not be par­tic­u­larly in­ter­est­ing or en­light­en­ing (as com­pared to the pre­vi­ous chap­ter, where the dis­cus­sion of struc­tural mod­els seems read­ily in­tel­ligible to some­one with lit­tle ex­pe­rience with them), though they do provide some in­sight into the is­sue of con­found­ing.

7. The Logic of Struc­ture-Based Counterfactuals

Pearl re­turns to the topic of coun­ter­fac­tu­als, briefly in­tro­duced be­fore, and gives them a firm math­e­mat­i­cal foun­da­tion and lin­guis­tic in­ter­pre­ta­tion, then makes their use­ful­ness clear. Coun­ter­fac­tu­als are the ba­sis of in­ter­ven­tions in com­plex sys­tems- they en­code the knowl­edge of what con­se­quences a par­tic­u­lar change would have. They also rep­re­sent a con­ve­nient way to store and test causal in­for­ma­tion.

This power to pre­dict the con­se­quences of changes is what makes causal mod­els su­pe­rior to non-causal mod­els. Pearl gives a great ex­am­ple of a ba­sic econo­met­ric situ­a­tion:



where q is the quan­tity de­manded, I is the house­hold in­come, p is the price level, w is the wage rate, and the uis are un­cor­re­lated er­ror terms. The equil­ibrium level of price and quan­tity de­manded is de­ter­mined by the feed­back be­tween those two equa­tions.

Pearl iden­ti­fies three quan­tities of in­ter­est:

  1. What is the ex­pected value of the de­mand Q if the price is con­trol­led at p=p0?

  2. What is the ex­pected value of the de­mand Q if the price is re­ported to be p=p0?

  3. Given that the price is cur­rently p=p0, what is the ex­pected value of the de­mand Q if we were to con­trol the price at p=p1?

The sec­ond is the only quan­tity available from stan­dard econo­met­ric anal­y­sis; the causal anal­y­sis Pearl de­scribes eas­ily calcu­lates all three quan­tities. Again, I leave all of the ac­tual math to the book, but this ex­am­ple was vivid enough that I had to reprint it.

The chap­ter con­tinues with a set of ax­ioms that de­scribe struc­tural coun­ter­fac­tu­als, which then al­lows Pearl to com­pare struc­tural coun­ter­fac­tu­als with for­mu­la­tions at­tempted by oth­ers. Again, for the reader only in­ter­ested in Pearl’s ap­proach, the com­par­i­sons are more te­dious than en­light­en­ing. There are enough pos­si­bly non-ob­vi­ous im­pli­ca­tions to re­ward the ded­i­cated reader, and the reader fa­mil­iar with the ob­ject of the com­par­i­son will find the com­par­i­son far more mean­ingful, but the hur­ried reader would be for­given for skip­ping a few sec­tions.

The dis­cus­sion of ex­o­gene­ity is valuable for all read­ers, though, as it elu­ci­dates a hi­er­ar­chy be­tween graph­i­cal crite­ria, er­ror-based crite­ria, and coun­ter­fac­tual crite­ria. Each of those crite­ria im­plies the one that fol­lows it, but the im­pli­ca­tions do not flow in the re­verse di­rec­tion; an­other ex­am­ple of how the lan­guage of graphs is more pow­er­ful than al­ter­na­tive lan­guages.

8. Im­perfect Ex­per­i­ments: Bound­ing Effects and Counterfactuals

This chap­ter de­scribes how to ex­tract use­ful in­for­ma­tion (through bounds) from im­perfect ex­per­i­ments. For ex­per­i­ments where all ob­served vari­ables are bi­nary (and, if they aren’t, they can be bi­na­rized through par­ti­tion­ing), but un­ob­served vari­ables are free to be mon­strously com­pli­cated, that com­plex­ity can be par­ti­tioned into four classes of re­sponses, to match the four pos­si­ble func­tional forms be­tween bi­nary vari­ables.

Pearl uses the ex­am­ple of med­i­cal drug test­ing- pa­tients are en­couraged to take the drug (ex­per­i­men­tal group) or not (con­trol group), but com­pli­ance may be im­perfect, as pa­tients may not take med­i­ca­tion given to them or pa­tients not given med­i­ca­tion may pro­cure it by other means. Pa­tients can be classed as ei­ther never tak­ing the drug, com­ply­ing with in­struc­tions, defy­ing in­struc­tions, or always tak­ing the drug. Similarly, the drug’s effect on pa­tient re­cov­ery can be clas­sified as never re­cov­er­ing, helping, hurt­ing, or always re­cov­er­ing. The two could ob­vi­ously be re­lated, and so the full joint dis­tri­bu­tion has 15 de­grees of free­dom- but we can pin down enough of those de­grees of free­dom with the ob­ser­va­tions that we make (of en­courage­ment, treat­ment, and then re­cov­ery) to es­tab­lish an up­per and lower bound for the effect that the treat­ment has on re­cov­ery.

The ex­am­ples in this chap­ter are much more de­tailed and nu­mer­i­cal; it also in­cludes a sec­tion on Bayesian es­ti­ma­tion of the pa­ram­e­ters as a com­ple­ment to or sub­sti­tute for bound­ing.

9. Prob­a­bil­ity of Cau­sa­tion: In­ter­pre­ta­tion and Identification

This chap­ter defines and differ­en­ti­ates be­tween three types of cau­sa­tion: nec­es­sary causes, suffi­cient causes, and nec­es­sary and suffi­cient causes. When start­ing a fire, oxy­gen is a nec­es­sary cause, but not a suffi­cient cause (given the lack of spon­ta­neous com­bus­tion). Strik­ing a match is both a nec­es­sary cause and a suffi­cient cause, as the fire would not oc­cur with­out the match and strik­ing a match is likely to start a fire. Th­ese in­tu­itive terms are given for­mal math­e­mat­i­cal defi­ni­tions us­ing coun­ter­fac­tu­als, and much of the chap­ter is de­voted to de­ter­min­ing when those coun­ter­fac­tu­als can be uniquely mea­sured (i.e. when they’re iden­ti­fi­able). Sim­ply know­ing the joint prob­a­bil­ity dis­tri­bu­tion is in­suffi­cient, but is suffi­cient to es­tab­lish lower and up­per bounds for those quan­tities. In the pres­ence of cer­tain as­sump­tions or causal graphs, those quan­tities are iden­ti­fi­able.

10. The Ac­tual Cause

This chap­ter pro­vides a for­mal defi­ni­tion of the con­cept of an “ac­tual cause,” use­ful pri­mar­ily for de­ter­min­ing le­gal li­a­bil­ity. For other con­texts, the con­cept of a suffi­cient cause may be more nat­u­ral. Pearl in­tro­duces the con­cepts of “sus­te­nance,” which is (in­for­mally) that a vari­able’s cur­rent set­ting is enough to cause the out­come, re­gard­less of other con­figu­ra­tions of the sys­tem, and “causal beams,” which are struc­tures used to de­ter­mine sus­te­nance from causal graphs. The chap­ter pro­vides more ex­am­ples of causal di­a­grams, and a bit more in­tu­ition about the var­i­ous kinds of cau­sa­tion, but is pri­mar­ily use­ful for ret­ro­spec­tive rather than pre­dic­tive anal­y­sis.

11. Reflec­tions, Elab­o­ra­tions, and Dis­cus­sions with Readers

This chap­ter bounces from topic to topic, and (per­haps un­sur­pris­ingly, given the ti­tle) elab­o­rates on many sec­tions of the book. It may be worth­while to read through chap­ter 11 in par­allel with the rest of the book, as many re­sponses to let­ters are Pearl clear­ing up a (pre­sum­ably com­mon) con­fu­sion with a con­cept. In­deed, the sec­tion num­bers of this chap­ter match the chap­ter num­bers of the rest of the book, and so 11.3 is a com­pan­ion to 3.

The first re­sponse, 11.1.1 is worth read­ing in full. One para­graph in par­tic­u­lar stands out:

Th­ese con­sid­er­a­tions im­ply that the slo­gan “cor­re­la­tion does not im­ply cau­sa­tion” can be trans­lated into a use­ful prin­ci­ple: be­hind ev­ery causal con­clu­sion there must lie some causal as­sump­tion that is not dis­cernible from the dis­tri­bu­tion func­tion.

Epi­logue. The Art and Science of Cause and Effect

The book con­cludes with a pub­lic lec­ture given in 1996. The lec­ture swiftly in­tro­duces con­cepts and their in­for­mal re­la­tion­ships, as well as some of the his­tor­i­cal con­text of the sci­en­tific un­der­stand­ing of causal­ity. The lec­ture moves swiftly but fo­cuses on the nar­ra­tive and mo­ti­va­tion over the math­e­mat­ics.

The pref­ace to the sec­ond edi­tion states that Pearl’s “main au­di­ence is the stu­dents” but the book is ac­tu­ally well-suited to be a refer­ence text for ex­perts. There are no ex­er­cises, and ax­ioms, the­o­rems, and defi­ni­tions out­weigh the ex­am­ples. (As a side note, if you find your­self stat­ing that the proof of a the­o­rem can be found in a pa­per you refer­ence, you are not tar­get­ing in­tro­duc­tory statis­tics stu­dents.)

I would recom­mend read­ing the lec­ture first, then the rest of the book (read­ing the cor­re­spond­ing sec­tion of chap­ter 11 af­ter each chap­ter), then the lec­ture again. The first read­ing of the lec­ture will mo­ti­vate many of the con­cepts in­volved in the book, then the book will for­mal­ize those con­cepts, and then a sec­ond read­ing of the epi­logue will be use­ful as a com­par­a­tive ex­er­cise. In­deed, the lay reader is likely to find the lec­ture en­gag­ing and in­for­ma­tive and the rest of the book im­pen­e­tra­ble, so they should read only it (again, it’s available on­line). When find­ing links for this post, I dis­cov­ered that the two most helpful Ama­zon re­views also sug­gested to read the epi­logue first.

There are many sec­tions of the book that com­pare Pearl’s ap­proach to other ap­proaches. For read­ers fa­mil­iar with those other ap­proaches, I imag­ine those sec­tions are well worth read­ing as they provide a clearer pic­ture of what Pearl’s ap­proach ac­tu­ally is, why it’s nec­es­sary, and also make mi­s­un­der­stand­ings less likely. For the reader who is not fa­mil­iar with those other ap­proaches, read­ing the com­par­i­sons will some­times provide deeper in­tu­ition, but of­ten just pro­vides his­tor­i­cal con­text. For the reader who has already bought into Pearl’s ap­proach, this can get frus­trat­ing- par­tic­u­larly when his treat­ment of al­ter­na­tives grows com­bat­ive.

Chap­ter 5 is where this be­comes sig­nifi­cantly no­tice­able, al­though I found the com­par­i­sons in that chap­ter helpful; they seemed di­rectly in­for­ma­tive about Pearl’s ap­proach. In sub­se­quent chap­ters, though, the con­vinced reader may skip en­tire sec­tions with lit­tle loss. Un­for­tu­nately, the sep­a­ra­tion is not always clean. For ex­am­ple, in sec­tion 6.1: 6.1.1 is definitely worth read­ing, 6.1.2 prob­a­bly not, but 6.1.3 is a mix­ture of rele­vant and ir­rele­vant; figure 6.2 (around a third of the way through that sec­tion) is helpful for un­der­stand­ing the causal graph ap­proach, but is in­tro­duced solely to poke holes in com­pet­ing ap­proaches! 6.1.4 be­gins com­par­a­tively, with the first bit re­peat­ing that other ap­proaches have prob­lems with this situ­a­tion, but then rapidly shifts to the me­chan­ics of nav­i­gat­ing the situ­a­tion where other ap­proaches founder.

In my pre­vi­ous re­view of Think­ing and De­cid­ing, it seemed nat­u­ral to recom­mend differ­ent sec­tions to differ­ent read­ers, as the book served many pur­poses. Here, the math­e­mat­i­cal de­vel­op­ment builds upon it­self, so at­tempt­ing to read chap­ter 4 with­out read­ing chap­ter 3 seems like a bad idea. Later chap­ters may be ir­rele­vant to some read­ers- chap­ters 9 and 10 are pri­mar­ily use­ful for mak­ing ret­ro­spec­tive and not pre­dic­tive state­ments, though they still provide some in­tu­ition about and ex­pe­rience with ma­nipu­lat­ing causal graphs.

All in all, the book seems deeply im­por­tant. Causal graphs, in­ter­ven­tions, and coun­ter­fac­tu­als are all very sig­nifi­cant con­cepts, and the book serves well as a refer­ence for them but per­haps not as an in­tro­duc­tion to them. It is prob­a­bly best at ex­plain­ing coun­ter­fac­tu­als, both what they are and why they are pow­er­ful, but I would feel far more con­fi­dent recom­mend­ing a less defen­sive vol­ume which fo­cused on the mo­ti­va­tions, ba­sics, and prac­tice for those con­cepts, rather than their math­e­mat­i­cal and the­o­ret­i­cal un­der­pin­nings.

On a more parochial note, much of the more re­cent work refer­enced in the book was done by one of Pearl’s former grad­u­ate stu­dents whose name LWers may rec­og­nize, and a ques­tion by EY prompts an ex­am­ple in 11.3.7.

The first edi­tion of the book is on­line for free here.

Many thanks to ma­jus, who lent me his copy of Causal­ity, with­out which this re­view would have oc­curred much later.