Fake Causality

Phlo­gis­ton was the 18 cen­tury’s an­swer to the Ele­men­tal Fire of the Greek al­chemists. Ig­nite wood, and let it burn. What is the or­angey-bright “fire” stuff? Why does the wood trans­form into ash? To both ques­tions, the 18th-cen­tury chemists an­swered, “phlo­gis­ton”.

...and that was it, you see, that was their an­swer: “Phlo­gis­ton.”

Phlo­gis­ton es­caped from burn­ing sub­stances as visi­ble fire. As the phlo­gis­ton es­caped, the burn­ing sub­stances lost phlo­gis­ton and so be­came ash, the “true ma­te­rial”. Flames in en­closed con­tain­ers went out be­cause the air be­came sat­u­rated with phlo­gis­ton, and so could not hold any more. Char­coal left lit­tle resi­due upon burn­ing be­cause it was nearly pure phlo­gis­ton.

Of course, one didn’t use phlo­gis­ton the­ory to pre­dict the out­come of a chem­i­cal trans­for­ma­tion. You looked at the re­sult first, then you used phlo­gis­ton the­ory to ex­plain it. It’s not that phlo­gis­ton the­o­rists pre­dicted a flame would ex­tin­guish in a closed con­tainer; rather they lit a flame in a con­tainer, watched it go out, and then said, “The air must have be­come sat­u­rated with phlo­gis­ton.” You couldn’t even use phlo­gis­ton the­ory to say what you ought not to see; it could ex­plain ev­ery­thing.

This was an ear­lier age of sci­ence. For a long time, no one re­al­ized there was a prob­lem. Fake ex­pla­na­tions don’t feel fake. That’s what makes them dan­ger­ous.

Modern re­search sug­gests that hu­mans think about cause and effect us­ing some­thing like the di­rected acyclic graphs (DAGs) of Bayes nets. Be­cause it rained, the side­walk is wet; be­cause the side­walk is wet, it is slip­pery:

[Rain] → [Side­walk wet] → [Side­walk slip­pery]

From this we can in­fer—or, in a Bayes net, rigor­ously calcu­late in prob­a­bil­ities—that when the side­walk is slip­pery, it prob­a­bly rained; but if we already know that the side­walk is wet, learn­ing that the side­walk is slip­pery tells us noth­ing more about whether it rained.

Why is fire hot and bright when it burns?

[“Phlo­gis­ton”] → [Fire hot and bright]

It feels like an ex­pla­na­tion. It’s rep­re­sented us­ing the same cog­ni­tive data for­mat. But the hu­man mind does not au­to­mat­i­cally de­tect when a cause has an un­con­strain­ing ar­row to its effect. Worse, thanks to hind­sight bias, it may feel like the cause con­strains the effect, when it was merely fit­ted to the effect.

In­ter­est­ingly, our mod­ern un­der­stand­ing of prob­a­bil­is­tic rea­son­ing about causal­ity can de­scribe pre­cisely what the phlo­gis­ton the­o­rists were do­ing wrong. One of the pri­mary in­spira­tions for Bayesian net­works was notic­ing the prob­lem of dou­ble-count­ing ev­i­dence if in­fer­ence res­onates be­tween an effect and a cause. For ex­am­ple, let’s say that I get a bit of un­re­li­able in­for­ma­tion that the side­walk is wet. This should make me think it’s more likely to be rain­ing. But, if it’s more likely to be rain­ing, doesn’t that make it more likely that the side­walk is wet? And wouldn’t that make it more likely that the side­walk is slip­pery? But if the side­walk is slip­pery, it’s prob­a­bly wet; and then I should again raise my prob­a­bil­ity that it’s rain­ing...

Judea Pearl uses the metaphor of an al­gorithm for count­ing sol­diers in a line. Sup­pose you’re in the line, and you see two sol­diers next to you, one in front and one in back. That’s three sol­diers. So you ask the sol­dier next to you, “How many sol­diers do you see?” He looks around and says, “Three”. So that’s a to­tal of six sol­diers. This, ob­vi­ously, is not how to do it.

A smarter way is to ask the sol­dier in front of you, “How many sol­diers for­ward of you?” and the sol­dier in back, “How many sol­diers back­ward of you?” The ques­tion “How many sol­diers for­ward?” can be passed on as a mes­sage with­out con­fu­sion. If I’m at the front of the line, I pass the mes­sage “1 sol­dier for­ward”, for my­self. The per­son di­rectly in back of me gets the mes­sage “1 sol­dier for­ward”, and passes on the mes­sage “2 sol­diers for­ward” to the sol­dier be­hind him. At the same time, each sol­dier is also get­ting the mes­sage “N sol­diers back­ward” from the sol­dier be­hind them, and pass­ing it on as “N+1 sol­diers back­ward” to the sol­dier in front of them. How many sol­diers in to­tal? Add the two num­bers you re­ceive, plus one for your­self: that is the to­tal num­ber of sol­diers in line.

The key idea is that ev­ery sol­dier must sep­a­rately track the two mes­sages, the for­ward-mes­sage and back­ward-mes­sage, and add them to­gether only at the end. You never add any sol­diers from the back­ward-mes­sage you re­ceive to the for­ward-mes­sage you pass back. In­deed, the to­tal num­ber of sol­diers is never passed as a mes­sage—no one ever says it aloud.

An analo­gous prin­ci­ple op­er­ates in rigor­ous prob­a­bil­is­tic rea­son­ing about causal­ity. If you learn some­thing about whether it’s rain­ing, from some source other than ob­serv­ing the side­walk to be wet, this will send a for­ward-mes­sage from [rain] to [side­walk wet] and raise our ex­pec­ta­tion of the side­walk be­ing wet. If you ob­serve the side­walk to be wet, this sends a back­ward-mes­sage to our be­lief that it is rain­ing, and this mes­sage prop­a­gates from [rain] to all neigh­bor­ing nodes ex­cept the [side­walk wet] node. We count each piece of ev­i­dence ex­actly once; no up­date mes­sage ever “bounces” back and forth. The ex­act al­gorithm may be found in Judea Pearl’s clas­sic “Prob­a­bil­is­tic Rea­son­ing in In­tel­li­gent Sys­tems: Net­works of Plau­si­ble In­fer­ence”.

So what went wrong in phlo­gis­ton the­ory? When we ob­serve that fire is hot, the [fire] node can send a back­ward-ev­i­dence to the [“phlo­gis­ton”] node, lead­ing us to up­date our be­liefs about phlo­gis­ton. But if so, we can’t count this as a suc­cess­ful for­ward-pre­dic­tion of phlo­gis­ton the­ory. The mes­sage should go in only one di­rec­tion, and not bounce back.

Alas, hu­man be­ings do not use a rigor­ous al­gorithm for up­dat­ing be­lief net­works. We learn about par­ent nodes from ob­serv­ing chil­dren, and pre­dict child nodes from be­liefs about par­ents. But we don’t keep rigor­ously sep­a­rate books for the back­ward-mes­sage and for­ward-mes­sage. We just re­mem­ber that phlo­gis­ton is hot, which causes fire to be hot. So it seems like phlo­gis­ton the­ory pre­dicts the hot­ness of fire. Or, worse, it just feels like phlo­gis­ton makes the fire hot.

Un­til you no­tice that no ad­vance pre­dic­tions are be­ing made, the non-con­strain­ing causal node is not la­beled “fake”. It’s rep­re­sented the same way as any other node in your be­lief net­work. It feels like a fact, like all the other facts you know: Phlo­gis­ton makes the fire hot.

A prop­erly de­signed AI would no­tice the prob­lem in­stantly. This wouldn’t even re­quire spe­cial-pur­pose code, just cor­rect book­keep­ing of the be­lief net­work. (Sadly, we hu­mans can’t rewrite our own code, the way a prop­erly de­signed AI could.)

Speak­ing of “hind­sight bias” is just the non­tech­ni­cal way of say­ing that hu­mans do not rigor­ously sep­a­rate for­ward and back­ward mes­sages, al­low­ing for­ward mes­sages to be con­tam­i­nated by back­ward ones.

Those who long ago went down the path of phlo­gis­ton were not try­ing to be fools. No sci­en­tist de­liber­ately wants to get stuck in a blind alley. Are there any fake ex­pla­na­tions in your mind? If there are, I guaran­tee they’re not la­beled “fake ex­pla­na­tion”, so pol­ling your thoughts for the “fake” key­word will not turn them up.

Thanks to hind­sight bias, it’s also not enough to check how well your the­ory “pre­dicts” facts you already know. You’ve got to pre­dict for to­mor­row, not yes­ter­day. It’s the only way a messy hu­man mind can be guaran­teed of send­ing a pure for­ward mes­sage.