Argument Screens Off Authority

Sce­nario 1: Barry is a fa­mous ge­ol­o­gist. Charles is a four­teen-year-old ju­ve­nile delin­quent with a long ar­rest record and oc­ca­sional psy­chotic epi­sodes. Barry flatly as­serts to Arthur some coun­ter­in­tu­itive state­ment about rocks, and Arthur judges it 90% prob­a­ble. Then Charles makes an equally coun­ter­in­tu­itive flat as­ser­tion about rocks, and Arthur judges it 10% prob­a­ble. Clearly, Arthur is tak­ing the speaker’s au­thor­ity into ac­count in de­cid­ing whether to be­lieve the speaker’s as­ser­tions.

Sce­nario 2: David makes a coun­ter­in­tu­itive state­ment about physics and gives Arthur a de­tailed ex­pla­na­tion of the ar­gu­ments, in­clud­ing refer­ences. Ernie makes an equally coun­ter­in­tu­itive state­ment, but gives an un­con­vinc­ing ar­gu­ment in­volv­ing sev­eral leaps of faith. Both David and Ernie as­sert that this is the best ex­pla­na­tion they can pos­si­bly give (to any­one, not just Arthur). Arthur as­signs 90% prob­a­bil­ity to David’s state­ment af­ter hear­ing his ex­pla­na­tion, but as­signs a 10% prob­a­bil­ity to Ernie’s state­ment.

It might seem like these two sce­nar­ios are roughly sym­met­ri­cal: both in­volve tak­ing into ac­count use­ful ev­i­dence, whether strong ver­sus weak au­thor­ity, or strong ver­sus weak ar­gu­ment.

But now sup­pose that Arthur asks Barry and Charles to make full tech­ni­cal cases, with refer­ences; and that Barry and Charles pre­sent equally good cases, and Arthur looks up the refer­ences and they check out. Then Arthur asks David and Ernie for their cre­den­tials, and it turns out that David and Ernie have roughly the same cre­den­tials—maybe they’re both clowns, maybe they’re both physi­cists.

As­sum­ing that Arthur is knowl­edge­able enough to un­der­stand all the tech­ni­cal ar­gu­ments—oth­er­wise they’re just im­pres­sive noises—it seems that Arthur should view David as hav­ing a great ad­van­tage in plau­si­bil­ity over Ernie, while Barry has at best a minor ad­van­tage over Charles.

In­deed, if the tech­ni­cal ar­gu­ments are good enough, Barry’s ad­van­tage over Charles may not be worth track­ing. A good tech­ni­cal ar­gu­ment is one that elimi­nates re­li­ance on the per­sonal au­thor­ity of the speaker.

Similarly, if we re­ally be­lieve Ernie that the ar­gu­ment he gave is the best ar­gu­ment he could give, which in­cludes all of the in­fer­en­tial steps that Ernie ex­e­cuted, and all of the sup­port that Ernie took into ac­count—cit­ing any au­thor­i­ties that Ernie may have listened to him­self—then we can pretty much ig­nore any in­for­ma­tion about Ernie’s cre­den­tials. Ernie can be a physi­cist or a clown, it shouldn’t mat­ter. (Again, this as­sumes we have enough tech­ni­cal abil­ity to pro­cess the ar­gu­ment. Other­wise, Ernie is sim­ply ut­ter­ing mys­ti­cal syl­la­bles, and whether we “be­lieve” these syl­la­bles de­pends a great deal on his au­thor­ity.)

So it seems there’s an asym­me­try be­tween ar­gu­ment and au­thor­ity. If we know au­thor­ity we are still in­ter­ested in hear­ing the ar­gu­ments; but if we know the ar­gu­ments fully, we have very lit­tle left to learn from au­thor­ity.

Clearly (says the novice) au­thor­ity and ar­gu­ment are fun­da­men­tally differ­ent kinds of ev­i­dence, a differ­ence un­ac­countable in the bor­ingly clean meth­ods of Bayesian prob­a­bil­ity the­ory.1 For while the strength of the ev­i­dences—90% ver­sus 10%—is just the same in both cases, they do not be­have similarly when com­bined. How will we ac­count for this?

Here’s half a tech­ni­cal demon­stra­tion of how to rep­re­sent this differ­ence in prob­a­bil­ity the­ory. (The rest you can take on my per­sonal au­thor­ity, or look up in the refer­ences.)

If P(H|E1) = 90% and P(H|E2) = 9%, what is the prob­a­bil­ity P(H|E1,E2)? If learn­ing E1 is true leads us to as­sign 90% prob­a­bil­ity to H, and learn­ing E2 is true leads us to as­sign 9% prob­a­bil­ity to H, then what prob­a­bil­ity should we as­sign to H if we learn both E1 and E2? This is sim­ply not some­thing you can calcu­late in prob­a­bil­ity the­ory from the in­for­ma­tion given. No, the miss­ing in­for­ma­tion is not the prior prob­a­bil­ity of H. The events E1 and E2 may not be in­de­pen­dent of each other.

Sup­pose that H is “My side­walk is slip­pery,” E1 is “My sprin­kler is run­ning,” and E2 is “It’s night.” The side­walk is slip­pery start­ing from one minute af­ter the sprin­kler starts, un­til just af­ter the sprin­kler finishes, and the sprin­kler runs for ten min­utes. So if we know the sprin­kler is on, the prob­a­bil­ity is 90% that the side­walk is slip­pery. The sprin­kler is on dur­ing 10% of the night­time, so if we know that it’s night, the prob­a­bil­ity of the side­walk be­ing slip­pery is 9%. If we know that it’s night and the sprin­kler is on—that is, if we know both facts—the prob­a­bil­ity of the side­walk be­ing slip­pery is 90%.

We can rep­re­sent this in a graph­i­cal model as fol­lows:

Whether or not it’s Night causes the Sprin­kler to be on or off, and whether the Sprin­kler is on causes the side­walk to be Slip­pery or unSlip­pery.

The di­rec­tion of the ar­rows is mean­ingful. Say we had:

This would mean that, if I didn’t know any­thing about the sprin­kler, the prob­a­bil­ity of Night­time and Slip­per­i­ness would be in­de­pen­dent of each other. For ex­am­ple, sup­pose that I roll Die One and Die Two, and add up the show­ing num­bers to get the Sum:

If you don’t tell me the sum of the two num­bers, and you tell me the first die showed 6, this doesn’t tell me any­thing about the re­sult of the sec­ond die, yet. But if you now also tell me the sum is 7, I know the sec­ond die showed 1.

Figur­ing out when var­i­ous pieces of in­for­ma­tion are de­pen­dent or in­de­pen­dent of each other, given var­i­ous back­ground knowl­edge, ac­tu­ally turns into a quite tech­ni­cal topic. The books to read are Judea Pearl’s Prob­a­bil­is­tic Rea­son­ing in In­tel­li­gent Sys­tems: Net­works of Plau­si­ble In­fer­ence and Causal­ity: Models, Rea­son­ing, and In­fer­ence. (If you only have time to read one book, read the first one.)

If you know how to read causal graphs, then you look at the dice-roll graph and im­me­di­ately see:

P(Die 1,Die 2) = P(Die 1) ✕ P(Die 2)

P(Die 1,Die 2|Sum) ≠ P(Die 1)|Sum) ✕ P(Die 2|Sum) .

If you look at the cor­rect side­walk di­a­gram, you see facts like:

P(Slip­pery|Night) ≠ P(Slip­pery)

P(Slip­pery|Sprin­kler) ≠ P(Slip­pery)

P(Slip­pery|Night,Sprin­kler) = P(Slip­pery|Sprin­kler) .

That is, the prob­a­bil­ity of the side­walk be­ing Slip­pery, given knowl­edge about the Sprin­kler and the Night, is the same prob­a­bil­ity we would as­sign if we knew only about the Sprin­kler. Knowl­edge of the Sprin­kler has made knowl­edge of the Night ir­rele­vant to in­fer­ences about Slip­per­i­ness.

This is known as screen­ing off , and the crite­rion that lets us read such con­di­tional in­de­pen­dences off causal graphs is known as D-sep­a­ra­tion.

For the case of ar­gu­ment and au­thor­ity, the causal di­a­gram looks like this:

If some­thing is true, then it there­fore tends to have ar­gu­ments in fa­vor of it, and the ex­perts there­fore ob­serve these ev­i­dences and change their opinions. (In the­ory!)

If we see that an ex­pert be­lieves some­thing, we in­fer back to the ex­is­tence of ev­i­dence-in-the-ab­stract (even though we don’t know what that ev­i­dence is ex­actly), and from the ex­is­tence of this ab­stract ev­i­dence, we in­fer back to the truth of the propo­si­tion.

But if we know the value of the Ar­gu­ment node, this D-sep­a­rates the node “Truth” from the node “Ex­pert Belief” by block­ing all paths be­tween them, ac­cord­ing to cer­tain tech­ni­cal crite­ria for “path block­ing” that seem pretty ob­vi­ous in this case. So even with­out check­ing the ex­act prob­a­bil­ity dis­tri­bu­tion, we can read off from the graph that:

P(truth|ar­gu­ment,ex­pert) = P(truth|ar­gu­ment) .

This does not rep­re­sent a con­tra­dic­tion of or­di­nary prob­a­bil­ity the­ory. It’s just a more com­pact way of ex­press­ing cer­tain prob­a­bil­is­tic facts. You could read the same equal­ities and in­equal­ities off an un­adorned prob­a­bil­ity dis­tri­bu­tion—but it would be harder to see it by eye­bal­ling. Author­ity and ar­gu­ment don’t need two differ­ent kinds of prob­a­bil­ity, any more than sprin­klers are made out of on­tolog­i­cally differ­ent stuff than sun­light.

In prac­tice you can never com­pletely elimi­nate re­li­ance on au­thor­ity. Good au­thor­i­ties are more likely to know about any coun­terev­i­dence that ex­ists and should be taken into ac­count; a lesser au­thor­ity is less likely to know this, which makes their ar­gu­ments less re­li­able. This is not a fac­tor you can elimi­nate merely by hear­ing the ev­i­dence they did take into ac­count.

It’s also very hard to re­duce ar­gu­ments to pure math; and oth­er­wise, judg­ing the strength of an in­fer­en­tial step may rely on in­tu­itions you can’t du­pli­cate with­out the same thirty years of ex­pe­rience.

There is an in­erad­i­ca­ble le­gi­t­i­macy to as­sign­ing slightly higher prob­a­bil­ity to what E. T. Jaynes tells you about Bayesian prob­a­bil­ity, than you as­sign to Eliezer Yud­kowsky mak­ing the ex­act same state­ment. Fifty ad­di­tional years of ex­pe­rience should not count for liter­ally zero in­fluence.

But this slight strength of au­thor­ity is only ce­teris paribus, and can eas­ily be over­whelmed by stronger ar­gu­ments. I have a minor er­ra­tum in one of Jaynes’s books—be­cause alge­bra trumps au­thor­ity.

1See “What Is Ev­i­dence?” in Map and Ter­ri­tory.