Hearsay, Double Hearsay, and Bayesian Updates

(trig­ger warn­ing: some de­scrip­tion of do­mes­tic vi­o­lence)

Sum­mary: I dis­cuss the strengths and weak­nesses of one way that the Amer­i­can le­gal sys­tem tries to as­sess and cope with the un­re­li­a­bil­ity of cer­tain kinds of ev­i­dence. After ex­plain­ing the rele­vant rules with refer­ences to a few re­cent fa­mous cases and a non-no­table case that I’m work­ing on now, I briefly con­sider whether this part of the ev­i­dence code is above or be­low the san­ity wa­ter­line, and sug­gest an in­cre­men­tal im­prove­ment.

Re­cently, I got to the point in my le­gal ca­reer where peo­ple are trust­ing me to write ev­i­den­tiary briefs, i.e., to ar­gue in front of a judge about what kinds of ev­i­dence are re­li­able enough to be safely pre­sented to a jury. There is an odd di­vi­sion of episte­molog­i­cal la­bor in the Amer­i­can court sys­tem: judges are thought [page 90] to be bet­ter than ju­ries at re­sist­ing pas­sion­ate or ma­nipu­la­tive or­a­tory, and ju­ries are thought to be bet­ter than judges at re­sist­ing bribery and (pre-ex­ist­ing) per­sonal ha­tred. As a re­sult, po­ten­tially in­flam­ma­tory or un­re­li­able ev­i­dence is pre­sented first to a judge, who (much like one of Eliezer’s Con­fes­sors) is sup­posed to sift the ex­hibit to see if nor­mal peo­ple can han­dle it with­out los­ing their ten­u­ous grip on san­ity. If and only if the ev­i­dence seems safe for or­di­nary hu­man con­sump­tion, the judge will al­low the lawyers to ar­gue about that ev­i­dence in front of the jury. Other­wise, the ev­i­dence sits in a card­board box in an un­heated ware­house, safely away from the eyes of the jury, un­til it’s time for an ap­peal.

The Hearsay Rule

By way of a con­crete ex­am­ple, one fa­mous re­cent case fea­tured a recorded 911 call made by a do­mes­tic vi­o­lence vic­tim to the emer­gency phone op­er­a­tor. The op­er­a­tor asked ques­tions about the lo­ca­tion and iden­tity of the per­son who was ac­cused of beat­ing the caller. The caller an­swered the ques­tions on tape, ex­plic­itly iden­ti­fy­ing her abuser as Mr. Adrian Martell Davis, and the an­swers were used first to find and ar­rest the sus­pect, and ul­ti­mately to con­vict him. The vic­tim was ap­par­ently too in­timi­dated to tes­tify in open court, and so her recorded state­ment as to the name of her abuser was ab­solutely nec­es­sary to sup­port a con­vic­tion—no record­ing, no con­vic­tion. Un­der the 400-year-old hearsay rule, recorded tes­ti­mony typ­i­cally is not al­lowed to be pre­sented to a jury—courts are con­cerned that the per­son giv­ing the recorded state­ment might be pres­sured by the po­lice in ways that wouldn’t show up on tape, and that al­low­ing a wit­ness to tes­tify with­out show­ing up in court un­fairly de­prives the defen­dant of a chance to (a) cross-ex­am­ine the wit­ness, and (b) have the jury see any fa­cial tics, body lan­guage, etc. that un­der­cut the wit­ness’s cred­i­bil­ity. In the 911 case, though, the Court faced a straight choice be­tween find­ing an ex­cep­tion to the hearsay rule and let­ting an ap­par­ent abuser go free.

In mak­ing this choice, the US Supreme Court man­aged to ig­nore a va­ri­ety of emo­tion­ally salient but episte­molog­i­cally ir­rele­vant dis­trac­tions, such as the se­ri­ous­ness of the crime, the rel­a­tive hel­pless­ness of the vic­tim, and the re­spectabil­ity of the 911 op­er­a­tor. In­stead, the Court fo­cused on the pur­pose for which the 911 state­ments were ob­tained. If the state­ments were ob­tained to help gather in­for­ma­tion needed to safely re­solve an on­go­ing emer­gency, they could be used at trial. If the state­ments, how­ever, were ob­tained to gather in­for­ma­tion about a past event, they could *not* be used at trial.

The the­ory sup­port­ing this dis­tinc­tion seems to have been that the right to cross-ex­am­ine and the right to have the jury see body lan­guage are fun­gible el­e­ments of a more gen­eral re­li­a­bil­ity test. A stranger’s as­ser­tion, with­out more, could be true or could be false. It doesn’t count as very much ev­i­dence. To turn an as­ser­tion into enough ev­i­dence to con­vict some­one be­yond a rea­son­able doubt, you need to show that the as­ser­tion comes with “in­di­cia of re­li­a­bil­ity.” Two of these in­di­cia are cross-ex­am­i­na­tion and body lan­guage—if a story checks out de­spite a vi­gor­ous un­friendly in­ter­view and the peer pres­sure of hav­ing to tell the story while phys­i­cally in the room with other peo­ple from your com­mu­nity, then that’s pretty good ev­i­dence. But you might have rea­sons to be­lieve a story even if you don’t get cross-ex­am­i­na­tion or body lan­guage. In the case of the 911 call, one might think that the caller had a strong mo­tive to tell the truth, be­cause if she didn’t, then the po­lice would go look­ing for the wrong guy, and her abuser would come find her and con­tinue hurt­ing her. Similarly, one might think that the op­er­a­tors had a strong mo­tive to ask fair, non-lead­ing ques­tions, be­cause of they didn’t get the right an­swer, then the po­lice might show up in the wrong neigh­bor­hood or with the wrong ex­pec­ta­tions, and there could be an un­nec­es­sary fire­fight. Fi­nally, one could ar­gue that a recorded state­ment made as events were un­fold­ing is in­her­ently more re­li­able (in some ways) than a nar­ra­tive given months or years af­ter the event; hu­man mem­ory gets cor­rupted faster than 8-track tapes.

Some com­bi­na­tion of these fac­tors con­vinced the Court to ad­mit the ev­i­dence. Other, very similar cases have been de­cided differ­ently. Whether they got that par­tic­u­lar de­ci­sion right or wrong, though, the frame­work of “in­di­cia of re­li­a­bil­ity” is hard-coded into Amer­i­can ev­i­dence law, es­pe­cially for civil cases. If you want to pre­sent ev­i­dence to a jury based on a state­ment that was made out­side of court, you have to give at least one rea­son why the state­ment is nev­er­the­less re­li­able.

Dou­ble and Triple Hearsay

Here’s where things re­ally get in­ter­est­ing: if your out-of-court state­ment quotes an­other out-of-court state­ment, the ev­i­dence is called “dou­ble hearsay,” and you need to in­de­pen­dently ver­ify each state­ment. If any link in the chain breaks, the whole doc­u­ment gets ex­cluded. For ex­am­ple, in the case I’m work­ing on now, the defen­dants want to show the jury a re­port filled out by Cal­ifor­nia’s Oc­cu­pa­tional Health and Safety Ad­minis­tra­tion (“OSHA”). The OSHA re­port is based al­most en­tirely on an ac­ci­dent re­port form filled out by a pri­vate cor­po­ra­tion. That re­port form, in turn, is based al­most en­tirely on an in­for­mal in­ter­view of the only eye­wit­ness to an ac­ci­dent. So the defen­dants can use the OSHA re­port if and only if the OSHA re­port, the ac­ci­dent re­port, and the in­for­mal in­ter­view are all re­li­able. Use A ↔ (A ∧ B ∧ C) are re­li­able.

To try to qual­ify the OSHA re­port, the defen­dants are ar­gu­ing that the OSHA re­port is re­li­able un­der the pub­lic record ex­cep­tion to the hearsay rule, mean­ing that the pub­lic offi­cials who pre­pared it had a stronger in­ter­est in ac­cu­rately re­port­ing pub­lic in­for­ma­tion than they did in the out­come of the ac­ci­dent vic­tim’s pri­vate case. To get the ac­ci­dent re­port form in, the defen­dants are ar­gu­ing that it is re­li­able un­der the busi­ness record ex­cep­tion to the hearsay rule, mean­ing that the cor­po­rate offi­cials who pre­pared it had a stronger in­ter­est in mak­ing sure their com­pany had ac­cess to ac­cu­rate in­for­ma­tion about safety risks than they did in the out­come of any one cus­tomer’s law­suit. As for the in­for­mal in­ter­view...well, I hon­estly have no idea how they plan to jus­tify its re­li­a­bil­ity. But, then again, I’m bi­ased. My pro­fes­sional in­ter­est lies in mak­ing sure that the whole string of un­helpful quo­ta­tions stays in a card­board box in a dank garage, far away from any ju­ries.

Do the Rules Work?

So far, I’ve been pleas­antly sur­prised at how well the Amer­i­can le­gal sys­tem han­dles some of these challenges. The fact that we have a two-tiered sys­tem of eval­u­at­ing ev­i­dence at all is a cut above av­er­age—imag­ine, e.g., the doc­tor who ex­am­ines you tak­ing notes on your con­di­tion, fil­ter­ing out any sub­jec­tive com­ments you make about how you’re sure it’s just a cold, and re­port­ing only your ob­jec­tive symp­toms to a sec­ond doc­tor, who then ren­ders a di­ag­no­sis. Or imag­ine a team of busi­ness con­sul­tants who in­ter­view a For­tune 500 com­pany’s lead­er­ship team, and then pass their writ­ten notes back to a team at HQ (who has never met the ex­ec­u­tives) so that HQ can catch any ob­vi­ous mis­takes in rea­son­ing be­fore send­ing out recom­men­da­tions. We know, in­tel­lec­tu­ally, that meet­ing peo­ple tends to make us friendlier to­ward them and more likely to adopt their point of view even if we en­counter no Bayesian ev­i­dence that in­creases the plau­si­bil­ity of their opinions, but our in­sti­tu­tions rarely take steps to guard against that bias.

I think my biggest crit­i­cism of the Amer­i­can ev­i­dence code is that it doesn’t ac­count for un­cer­tainty in the model. For in­stance, if I read the head­line on a piece of sci­ence jour­nal­ism say­ing that (e.g.) coffee con­sump­tion re­duces the risk of prostate can­cer, or that re­ceiv­ing spank­ings in child­hood is nega­tively cor­re­lated with con­scien­tious­ness as an adult, there are least six lay­ers of ‘hearsay’—I might have mi­s­un­der­stood the head­line, the head­line might have mis-sum­ma­rized the ar­ti­cle, the ar­ti­cle might have mis­quoted the sci­en­tist, the sci­en­tist might have mis­in­ter­preted the recorded data, the recorded data might not faith­fully re­flect what ac­tu­ally hap­pened dur­ing the ex­per­i­ment, and the ex­per­i­ment might not faith­fully repli­cate the real-world con­di­tions that in­ter­est us.

Even if I can ar­tic­u­late plau­si­ble rea­sons why each step in the trans­mis­sion of in­for­ma­tion was “re­li­able,” I should be very skep­ti­cal that my *model* of the trans­mis­sion is ac­cu­rate. I only have to be wrong about one of the six steps for my es­ti­mate of the in­for­ma­tion’s plau­si­bil­ity to be un­trust­wor­thy. If the in­for­ma­tion would only provide a few deci­bels of ev­i­dence even if it were perfectly re­li­able, then try­ing to calcu­late how many points a semi-re­li­able piece of ev­i­dence is worth can fail be­cause of a low sig­nal-to-noise ra­tio. E.g., sup­pose I learn that nei­ther the sus­pect nor the ac­tual crim­i­nal were red­heads—I might be ab­solutely cer­tain of this new piece of in­for­ma­tion, but that’s still nowhere near enough ev­i­dence to sup­port a con­vic­tion. If in­stead I learn that there is prob­a­bly some­thing like a 60% chance that nei­ther the sus­pect nor the crim­i­nal had red hair, that da­tum re­ally doesn’t tell me any­thing at all—the info shouldn’t shift my prior enough for my prior to be no­tice­ably differ­ent.

Although courts are al­lowed to con­sider the ex­tent to which an un­duly long chain of in­fer­ences makes ev­i­dence less “trust­wor­thy,” I think that on bal­ance de­ci­sions would be more ac­cu­rate if there were a firm limit—say, three lay­ers—be­yond which ev­i­dence was sim­ply in­ad­mis­si­ble as a mat­ter of law. If A says that B says that C says that D shot some­one, then no mat­ter how re­li­able we think A, B, and C are, we should prob­a­bly keep that ev­i­dence away from the jury un­less we can haul at least one of B, C, or D into court to an­swer cross-ex­am­i­na­tion.