Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased)

Note: I am cross-post­ing this GiveWell Blog post, af­ter con­sult­ing a cou­ple of com­mu­nity mem­bers, be­cause it is rele­vant to many top­ics dis­cussed on Less Wrong, par­tic­u­larly effi­cient char­ity/​op­ti­mal philan­thropy and Pas­cal’s Mug­ging. The post in­cludes a pro­posed “solu­tion” to the dilemma posed by Pas­cal’s Mug­ging that has not been pro­posed be­fore as far as I know. It is longer than usual for a Less Wrong post, so I have put ev­ery­thing but the sum­mary be­low the fold. Also, note that I use the term “ex­pected value” be­cause it is more generic than “ex­pected util­ity”; the ar­gu­ments here per­tain to es­ti­mat­ing the ex­pected value of any quan­tity, not just util­ity.

While some peo­ple feel that GiveWell puts too much em­pha­sis on the mea­surable and quan­tifi­able, there are oth­ers who go fur­ther than we do in quan­tifi­ca­tion, and jus­tify their giv­ing (or other) de­ci­sions based on fully ex­plicit ex­pected-value for­mu­las. The lat­ter group tends to cri­tique us—or at least dis­agree with us—based on our prefer­ence for strong ev­i­dence over high ap­par­ent “ex­pected value,” and based on the heavy role of non-for­mal­ized in­tu­ition in our de­ci­sion­mak­ing. This post is di­rected at the lat­ter group.

We be­lieve that peo­ple in this group are of­ten mak­ing a fun­da­men­tal mis­take, one that we have long had in­tu­itive ob­jec­tions to but have re­cently de­vel­oped a more for­mal (though still fairly rough) cri­tique of. The mis­take (we be­lieve) is es­ti­mat­ing the “ex­pected value” of a dona­tion (or other ac­tion) based solely on a fully ex­plicit, quan­tified for­mula, many of whose in­puts are guesses or very rough es­ti­mates. We be­lieve that any es­ti­mate along these lines needs to be ad­justed us­ing a “Bayesian prior”; that this ad­just­ment can rarely be made (rea­son­ably) us­ing an ex­plicit, for­mal calcu­la­tion; and that most at­tempts to do the lat­ter, even when they seem to be mak­ing very con­ser­va­tive down­ward ad­just­ments to the ex­pected value of an op­por­tu­nity, are not mak­ing nearly large enough down­ward ad­just­ments to be con­sis­tent with the proper Bayesian ap­proach.

This view of ours illus­trates why—while we seek to ground our recom­men­da­tions in rele­vant facts, calcu­la­tions and quan­tifi­ca­tions to the ex­tent pos­si­ble—ev­ery recom­men­da­tion we make in­cor­po­rates many differ­ent forms of ev­i­dence and in­volves a strong dose of in­tu­ition. And we gen­er­ally pre­fer to give where we have strong ev­i­dence that dona­tions can do a lot of good rather than where we have weak ev­i­dence that dona­tions can do far more good—a prefer­ence that I be­lieve is in­con­sis­tent with the ap­proach of giv­ing based on ex­plicit ex­pected-value for­mu­las (at least those that (a) have sig­nifi­cant room for er­ror (b) do not in­cor­po­rate Bayesian ad­just­ments, which are very rare in these analy­ses and very difficult to do both for­mally and rea­son­ably).

The rest of this post will:

  • Lay out the “ex­plicit ex­pected value for­mula” ap­proach to giv­ing, which we op­pose, and give ex­am­ples.

  • Give the in­tu­itive ob­jec­tions we’ve long had to this ap­proach, i.e., ways in which it seems in­tu­itively prob­le­matic.

  • Give a clean ex­am­ple of how a Bayesian ad­just­ment can be done, and can be an im­prove­ment on the “ex­plicit ex­pected value for­mula” ap­proach.

  • Pre­sent a ver­sa­tile for­mula for mak­ing and illus­trat­ing Bayesian ad­just­ments that can be ap­plied to char­ity cost-effec­tive­ness es­ti­mates.

  • Show how a Bayesian ad­just­ment avoids the Pas­cal’s Mug­ging prob­lem that those who rely on ex­plicit ex­pected value calcu­la­tions seem prone to.

  • Dis­cuss how one can prop­erly ap­ply Bayesian ad­just­ments in other cases, where less in­for­ma­tion is available.

  • Con­clude with the fol­low­ing take­aways:

    • Any ap­proach to de­ci­sion-mak­ing that re­lies only on rough es­ti­mates of ex­pected value—and does not in­cor­po­rate prefer­ences for bet­ter-grounded es­ti­mates over shak­ier es­ti­mates—is flawed.

    • When aiming to max­i­mize ex­pected pos­i­tive im­pact, it is not ad­vis­able to make giv­ing de­ci­sions based fully on ex­plicit for­mu­las. Proper Bayesian ad­just­ments are im­por­tant and are usu­ally overly difficult to for­mal­ize.

    • The above point is a gen­eral defense of re­sist­ing ar­gu­ments that both (a) seem in­tu­itively prob­le­matic (b) have thin ev­i­den­tial sup­port and/​or room for sig­nifi­cant er­ror.

The ap­proach we op­pose: “ex­plicit ex­pected-value” (EEV) decisionmaking

We term the ap­proach this post ar­gues against the “ex­plicit ex­pected-value” (EEV) ap­proach to de­ci­sion­mak­ing. It gen­er­ally in­volves an ar­gu­ment of the form:

  • I es­ti­mate that each dol­lar spent on Pro­gram P has a value of V [in terms of lives saved, dis­abil­ity-ad­justed life-years, so­cial re­turn on in­vest­ment, or some other met­ric]. Granted, my es­ti­mate is ex­tremely rough and un­re­li­able, and in­volves ge­o­met­ri­cally com­bin­ing mul­ti­ple un­re­li­able figures—but it’s un­bi­ased, i.e., it seems as likely to be too pes­simistic as it is to be too op­ti­mistic. There­fore, my es­ti­mate V rep­re­sents the per-dol­lar ex­pected value of Pro­gram P.

  • I don’t know how good Char­ity C is at im­ple­ment­ing Pro­gram P, but even if it wastes 75% of its money or has a 75% chance of failure, its per-dol­lar ex­pected value is still 25%*V, which is still ex­cel­lent.

Ex­am­ples of the EEV ap­proach to de­ci­sion­mak­ing:

  • In a 2010 ex­change, Will Crouch of Giv­ing What We Can ar­gued:

    DtW [De­worm the World] spends about 74% on tech­ni­cal as­sis­tance and scal­ing up de­worm­ing pro­grams within Kenya and In­dia … Let’s as­sume (very im­plau­si­bly) that all other money (spent on ad­vo­cacy etc) is wasted, and as­sess the char­ity solely on that 74%. It still would do very well (tak­ing DCP2: $3.4/​DALY * (1/​0.74) = $4.6/​DALY – slightly bet­ter than their most op­ti­mistic es­ti­mate for DOTS (for TB), and far bet­ter than their es­ti­mates for in­sec­ti­cide treated nets, con­dom dis­tri­bu­tion, etc). So, though find­ing out more about their ad­vo­cacy work is ob­vi­ously a great thing to do, the ad­vo­cacy ques­tions don’t need to be an­swered in or­der to make a recom­men­da­tion: it seems that DtW [is] worth recom­mend­ing on the ba­sis of their con­trol pro­grams alone.

  • The Back of the En­velope Guide to Philan­thropy lists rough calcu­la­tions for the value of differ­ent char­i­ta­ble in­ter­ven­tions. Th­ese calcu­la­tions im­ply (among other things) that donat­ing for poli­ti­cal ad­vo­cacy for higher for­eign aid is be­tween 8x and 22x as good an in­vest­ment as donat­ing to VillageReach, and the pre­sen­ta­tion and im­pli­ca­tion are that this calcu­la­tion ought to be con­sid­ered de­ci­sive.

  • We’ve en­coun­tered nu­mer­ous peo­ple who ar­gue that char­i­ties work­ing on re­duc­ing the risk of sud­den hu­man ex­tinc­tion must be the best ones to sup­port, since the value of sav­ing the hu­man race is so high that “any imag­in­able prob­a­bil­ity of suc­cess” would lead to a higher ex­pected value for these char­i­ties than for oth­ers.

  • “Pas­cal’s Mug­ging” is of­ten seen as the re­duc­tio ad ab­sur­dum of this sort of rea­son­ing. The idea is that if a per­son de­mands $10 in ex­change for re­frain­ing from an ex­tremely harm­ful ac­tion (one that nega­tively af­fects N peo­ple for some huge N), then ex­pected-value calcu­la­tions de­mand that one give in to the per­son’s de­mands: no mat­ter how un­likely the claim, there is some N big enough that the “ex­pected value” of re­fus­ing to give the $10 is hugely nega­tive.

The cru­cial char­ac­ter­is­tic of the EEV ap­proach is that it does not in­cor­po­rate a sys­tem­atic prefer­ence for bet­ter-grounded es­ti­mates over rougher es­ti­mates. It ranks char­i­ties/​ac­tions based sim­ply on their es­ti­mated value, ig­nor­ing differ­ences in the re­li­a­bil­ity and ro­bust­ness of the es­ti­mates. In­for­mal ob­jec­tions to EEV de­ci­sion­mak­ing There are many ways in which the sort of rea­son­ing laid out above seems (to us) to fail a com­mon sense test.

  • There seems to be noth­ing in EEV that pe­nal­izes rel­a­tive ig­no­rance or rel­a­tively poorly grounded es­ti­mates, or re­wards in­ves­ti­ga­tion and the form­ing of par­tic­u­larly well grounded es­ti­mates. If I can liter­ally save a child I see drown­ing by ru­in­ing a $1000 suit, but in the same mo­ment I make a wild guess that this $1000 could save 2 lives if put to­ward med­i­cal re­search, EEV seems to in­di­cate that I should opt for the lat­ter.

  • Be­cause of this, a world in which peo­ple acted based on EEV would seem to be prob­le­matic in var­i­ous ways.

    • In such a world, it seems that nearly all al­tru­ists would put nearly all of their re­sources to­ward helping peo­ple they knew lit­tle about, rather than helping them­selves, their fam­i­lies and their com­mu­ni­ties. I be­lieve that the world would be worse off if peo­ple be­haved in this way, or at least if they took it to an ex­treme. (There are always more peo­ple you know lit­tle about than peo­ple you know well, and EEV es­ti­mates of how much good you can do for peo­ple you don’t know seem likely to have higher var­i­ance than EEV es­ti­mates of how much good you can do for peo­ple you do know. There­fore, it seems likely that the high­est-EEV ac­tion di­rected at peo­ple you don’t know will have higher EEV than the high­est-EEV ac­tion di­rected at peo­ple you do know.)

    • In such a world, when peo­ple de­cided that a par­tic­u­lar en­deavor/​ac­tion had out­stand­ingly high EEV, there would (too of­ten) be no jus­tifi­ca­tion for costly skep­ti­cal in­quiry of this en­deavor/​ac­tion. For ex­am­ple, say that peo­ple were try­ing to ma­nipu­late the weather; that some­one hy­poth­e­sized that they had no power for such ma­nipu­la­tion; and that the EEV of try­ing to ma­nipu­late the weather was much higher than the EEV of other things that could be done with the same re­sources. It would be difficult to jus­tify a costly in­ves­ti­ga­tion of the “try­ing to ma­nipu­late the weather is a waste of time” hy­poth­e­sis in this frame­work. Yet it seems that when peo­ple are valu­ing one ac­tion far above oth­ers, based on thin in­for­ma­tion, this is the time when skep­ti­cal in­quiry is needed most. And more gen­er­ally, it seems that challeng­ing and in­ves­ti­gat­ing our most firmly held, “high-es­ti­mated-prob­a­bil­ity” be­liefs—even when do­ing so has been costly—has been quite benefi­cial to so­ciety.

  • Re­lated: giv­ing based on EEV seems to cre­ate bad in­cen­tives. EEV doesn’t seem to al­low re­ward­ing char­i­ties for trans­parency or pe­nal­iz­ing them for opac­ity: it sim­ply recom­mends giv­ing to the char­ity with the high­est es­ti­mated ex­pected value, re­gard­less of how well-grounded the es­ti­mate is. There­fore, in a world in which most donors used EEV to give, char­i­ties would have ev­ery in­cen­tive to an­nounce that they were fo­cus­ing on the high­est ex­pected-value pro­grams, with­out dis­clos­ing any de­tails of their op­er­a­tions that might show they were achiev­ing less value than the­o­ret­i­cal es­ti­mates said they ought to be.

  • If you are bas­ing your ac­tions on EEV anal­y­sis, it seems that you’re very open to be­ing ex­ploited by Pas­cal’s Mug­ging: a tiny prob­a­bil­ity of a huge-value ex­pected out­come can come to dom­i­nate your de­ci­sion­mak­ing in ways that seem to vi­o­late com­mon sense. (We dis­cuss this fur­ther be­low.)

  • If I’m de­cid­ing be­tween eat­ing at a new restau­rant with 3 Yelp re­views av­er­ag­ing 5 stars and eat­ing at an older restau­rant with 200 Yelp re­views av­er­ag­ing 4.75 stars, EEV seems to im­ply (us­ing Yelp rat­ing as a stand-in for “ex­pected value of the ex­pe­rience”) that I should opt for the former. As dis­cussed in the next sec­tion, I think this is the purest demon­stra­tion of the prob­lem with EEV and the need for Bayesian ad­just­ments.

In the re­main­der of this post, I pre­sent what I be­lieve is the right for­mal frame­work for my ob­jec­tions to EEV. How­ever, I have more con­fi­dence in my in­tu­itions—which are re­lated to the above ob­ser­va­tions—than in the frame­work it­self. I be­lieve I have for­mal­ized my thoughts cor­rectly, but if the re­main­der of this post turned out to be flawed, I would likely re­main in ob­jec­tion to EEV un­til and un­less one could ad­dress my less for­mal mis­giv­ings.

Sim­ple ex­am­ple of a Bayesian ap­proach vs. an EEV approach

It seems fairly clear that a restau­rant with 200 Yelp re­views, av­er­ag­ing 4.75 stars, ought to out­rank a restau­rant with 3 Yelp re­views, av­er­ag­ing 5 stars. Yet this rank­ing can’t be jus­tified in an EEV-style frame­work, in which op­tions are ranked by their es­ti­mated av­er­age/​ex­pected value. How, in fact, does Yelp han­dle this situ­a­tion?

Un­for­tu­nately, the an­swer ap­pears to be undis­closed in Yelp’s case, but we can get a hint from a similar site: BeerAd­vo­cate, a site that ranks beers us­ing sub­mit­ted re­views. It states:

Lists are gen­er­ated us­ing a Bayesian es­ti­mate that pulls data from mil­lions of user re­views (not hand-picked) and nor­mal­izes scores based on the num­ber of re­views for each beer. The gen­eral statis­ti­cal for­mula is: weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where: R = re­view av­er­age for the beer v = num­ber of re­views for the beer m = min­i­mum re­views re­quired to be con­sid­ered (cur­rently 10) C = the mean across the list (cur­rently 3.66)

In other words, BeerAd­vo­cate does the equiv­a­lent of giv­ing each beer a set num­ber (cur­rently 10) of “av­er­age” re­views (i.e., re­views with a score of 3.66, which is the av­er­age for all beers on the site). Thus, a beer with zero re­views is as­sumed to be ex­actly as good as the av­er­age beer on the site; a beer with one re­view will still be as­sumed to be close to av­er­age, no mat­ter what rat­ing the one re­view gives; as the num­ber of re­views grows, the beer’s rat­ing is able to de­vi­ate more from the av­er­age.

To illus­trate this, the fol­low­ing chart shows how BeerAd­vo­cate’s for­mula would rate a beer that has 0-100 five-star re­views. As the num­ber of five-star re­views grows, the for­mula’s “con­fi­dence” in the five-star rat­ing grows, and the beer’s over­all rat­ing gets fur­ther from “av­er­age” and closer to (though never fully reach­ing) 5 stars.

I find BeerAd­vo­cate’s ap­proach to be quite rea­son­able and I find the chart above to ac­cord quite well with in­tu­ition: a beer with a small hand­ful of five-star re­views should be con­sid­ered pretty close to av­er­age, while a beer with a hun­dred five-star re­views should be con­sid­ered to be nearly a five-star beer.

How­ever, there are a cou­ple of com­pli­ca­tions that make it difficult to ap­ply this ap­proach broadly.

  • BeerAd­vo­cate is mak­ing a sub­stan­tial judg­ment call re­gard­ing what “prior” to use, i.e., how strongly to as­sume each beer is av­er­age un­til proven oth­er­wise. It cur­rently sets the m in its for­mula equal to 10, which is like giv­ing each beer a start­ing point of ten av­er­age-level re­views; it gives no for­mal jus­tifi­ca­tion for why it has set m to 10 in­stead of 1 or 100. It is un­clear what such a jus­tifi­ca­tion would look like. In fact, I be­lieve that BeerAd­vo­cate used to use a stronger “prior” (i.e., it used to set m to a higher value), which meant that beers needed larger num­bers of re­views to make the top-rated list. When BeerAd­vo­cate changed its prior, its rank­ings changed dra­mat­i­cally, as lesser-known, higher-rated beers over­took the main­stream beers that had pre­vi­ously dom­i­nated the list.

  • In BeerAd­vo­cate’s case, the ba­sic ap­proach to set­ting a Bayesian prior seems pretty straight­for­ward: the “prior” rat­ing for a given beer is equal to the av­er­age rat­ing for all beers on the site, which is known. By con­trast, if we’re look­ing at the es­ti­mate of how much good a char­ity does, it isn’t clear what “av­er­age” one can use for a prior; it isn’t even clear what the ap­pro­pri­ate refer­ence class is. Should our prior value for the good-ac­com­plished-per-dol­lar of a de­worm­ing char­ity be equal to the good-ac­com­plished-per-dol­lar of the av­er­age de­worm­ing char­ity, or of the av­er­age health char­ity, or the av­er­age char­ity, or the av­er­age al­tru­is­tic ex­pen­di­ture, or some weighted av­er­age of these? Of course, we don’t ac­tu­ally have any of these figures. For this rea­son, it’s hard to for­mally jus­tify one’s prior, and differ­ences in pri­ors can cause ma­jor dis­agree­ments and con­fu­sions when they aren’t rec­og­nized for what they are. But this doesn’t mean the choice of prior should be ig­nored or that one should leave the prior out of ex­pected-value calcu­la­tions (as we be­lieve EEV ad­vo­cates do).

Ap­ply­ing Bayesian ad­just­ments to cost-effec­tive­ness es­ti­mates for dona­tions, ac­tions, etc.

As dis­cussed above, we be­lieve that both Giv­ing What We Can and Back of the En­velope Guide to Philan­thropy use forms of EEV anal­y­sis in ar­gu­ing for their char­ity recom­men­da­tions. How­ever, when it comes to an­a­lyz­ing the cost-effec­tive­ness es­ti­mates they in­voke, the BeerAd­vo­cate for­mula doesn’t seem ap­pli­ca­ble: there is no “num­ber of re­views” figure that can be used to de­ter­mine the rel­a­tive weights of the prior and the es­ti­mate.

In­stead, we pro­pose a model in which there is a nor­mally (or log-nor­mally) dis­tributed “es­ti­mate er­ror” around the cost-effec­tive­ness es­ti­mate (with a mean of “no er­ror,” i.e., 0 for nor­mally dis­tributed er­ror and 1 for log­nor­mally dis­tributed er­ror), and in which the prior dis­tri­bu­tion for cost-effec­tive­ness is nor­mally (or log-nor­mally) dis­tributed as well. (I won’t dis­cuss log-nor­mal dis­tri­bu­tions in this post, but the anal­y­sis I give can be ex­tended by ap­ply­ing it to the log of the vari­ables in ques­tion.) The more one feels con­fi­dent in one’s pre-ex­ist­ing view of how cost-effec­tive an dona­tion or ac­tion should be, the smaller the var­i­ance of the “prior”; the more one feels con­fi­dent in the cost-effec­tive­ness es­ti­mate it­self, the smaller the var­i­ance of the “es­ti­mate er­ror.”

Fol­low­ing up on our 2010 ex­change with Giv­ing What We Can, we asked Dario Amodei to write up the im­pli­ca­tions of the above model and the form of the proper Bayesian ad­just­ment. You can see his anal­y­sis here. The bot­tom line is that when one ap­plies Bayes’s rule to ob­tain a dis­tri­bu­tion for cost-effec­tive­ness based on (a) a nor­mally dis­tributed prior dis­tri­bu­tion (b) a nor­mally dis­tributed “es­ti­mate er­ror,” one ob­tains a dis­tri­bu­tion with

  • Mean equal to the av­er­age of the two means weighted by their in­verse var­i­ances

  • Var­i­ance equal to the har­monic sum of the two variances

The fol­low­ing charts show what this for­mula im­plies in a va­ri­ety of differ­ent sim­ple hy­po­thet­i­cals. In all of these, the prior dis­tri­bu­tion has mean = 0 and stan­dard de­vi­a­tion = 1, and the es­ti­mate has mean = 10, but the “es­ti­mate er­ror” varies, with im­por­tant effects: an es­ti­mate with lit­tle enough es­ti­mate er­ror can al­most be taken liter­ally, while an es­ti­mate with large enough es­ti­mate er­ror ends ought to be al­most ig­nored.

In each of these charts, the black line rep­re­sents a prob­a­bil­ity den­sity func­tion for one’s “prior,” the red line for an es­ti­mate (with the var­i­ance com­ing from “es­ti­mate er­ror”), and the blue line for the fi­nal prob­a­bil­ity dis­tri­bu­tion, tak­ing both the prior and the es­ti­mate into ac­count. Taller, nar­rower dis­tri­bu­tions rep­re­sent cases where prob­a­bil­ity is con­cen­trated around the mid­point; shorter, wider dis­tri­bu­tions rep­re­sent cases where the pos­si­bil­ities/​prob­a­bil­ities are more spread out among many val­ues. First, the case where the cost-effec­tive­ness es­ti­mate has the same con­fi­dence in­ter­val around it as the prior:

If one has a rel­a­tively re­li­able es­ti­mate (i.e., one with a nar­row con­fi­dence in­ter­val /​ small var­i­ance of “es­ti­mate er­ror,”) then the Bayesian-ad­justed con­clu­sion ends up very close to the es­ti­mate. When we es­ti­mate quan­tities us­ing highly pre­cise and well-un­der­stood meth­ods, we can use them (al­most) liter­ally.

On the flip side, when the es­ti­mate is rel­a­tively un­re­li­able (wide con­fi­dence in­ter­val /​ large var­i­ance of “es­ti­mate er­ror”), it has lit­tle effect on the fi­nal ex­pec­ta­tion of cost-effec­tive­ness (or what­ever is be­ing es­ti­mated). And at the point where the one-stan­dard-de­vi­a­tion bands in­clude zero cost-effec­tive­ness (i.e., where there’s a pretty strong prob­a­bil­ity that the whole cost-effec­tive­ness es­ti­mate is worth­less), the es­ti­mate ends up hav­ing prac­ti­cally no effect on one’s fi­nal view.

The de­tails of how to ap­ply this sort of anal­y­sis to cost-effec­tive­ness es­ti­mates for char­i­ta­ble in­ter­ven­tions are out­side the scope of this post, which fo­cuses on our be­lief in the im­por­tance of the con­cept of Bayesian ad­just­ments. The big-pic­ture take­away is that just hav­ing the mid­point of a cost-effec­tive­ness es­ti­mate is not worth very much in it­self; it is im­por­tant to un­der­stand the sources of es­ti­mate er­ror, and the de­gree of es­ti­mate er­ror rel­a­tive to the de­gree of vari­a­tion in es­ti­mated cost-effec­tive­ness for differ­ent in­ter­ven­tions.

Pas­cal’s Mugging

Pas­cal’s Mug­ging refers to a case where a claim of ex­trav­a­gant im­pact is made for a par­tic­u­lar ac­tion, with lit­tle to no ev­i­dence:

Now sup­pose some­one comes to me and says, “Give me five dol­lars, or I’ll use my magic pow­ers … to [harm an imag­in­ably huge num­ber of] peo­ple.

Non-Bayesian ap­proaches to eval­u­at­ing these pro­pos­als of­ten take the fol­low­ing form: “Even if we as­sume that this anal­y­sis is 99.99% likely to be wrong, the ex­pected value is still high—and are you will­ing to bet that this anal­y­sis is wrong at 99.99% odds?”

How­ever, this is a case where “es­ti­mate er­ror” is prob­a­bly ac­count­ing for the lion’s share of var­i­ance in es­ti­mated ex­pected value, and there­fore I be­lieve that a proper Bayesian ad­just­ment would cor­rectly as­sign lit­tle value where there is lit­tle ba­sis for the es­ti­mate, no mat­ter how high the mid­point of the es­ti­mate.

Say that you’ve come to be­lieve—based on life ex­pe­rience—in a “prior dis­tri­bu­tion” for the value of your ac­tions, with a mean of zero and a stan­dard de­vi­a­tion of 1. (The unit type you use to value your ac­tions is ir­rele­vant to the point I’m mak­ing; so in this case the units I’m us­ing are sim­ply stan­dard de­vi­a­tions based on your prior dis­tri­bu­tion for the value of your ac­tions). Now say that some­one es­ti­mates that ac­tion A (e.g., giv­ing in to the mug­ger’s de­mands) has an ex­pected value of X (same units) - but that the es­ti­mate it­self is so rough that the right ex­pected value could eas­ily be 0 or 2X. More speci­fi­cally, say that the er­ror in the ex­pected value es­ti­mate has a stan­dard de­vi­a­tion of X.

An EEV ap­proach to this situ­a­tion might say, “Even if there’s a 99.99% chance that the es­ti­mate is com­pletely wrong and that the value of Ac­tion A is 0, there’s still an 0.01% prob­a­bil­ity that Ac­tion A has a value of X. Thus, over­all Ac­tion A has an ex­pected value of at least 0.0001X; the greater X is, the greater this value is, and if X is great enough then, then you should take Ac­tion A un­less you’re will­ing to bet at enor­mous odds that the frame­work is wrong.”

How­ever, the same for­mula dis­cussed above in­di­cates that Ac­tion X ac­tu­ally has an ex­pected value—af­ter the Bayesian ad­just­ment—of X/​(X^2+1), or just un­der 1/​X. In this frame­work, the greater X is, the lower the ex­pected value of Ac­tion A. This syncs well with my in­tu­itions: if some­one threat­ened to harm one per­son un­less you gave them $10, this ought to carry more weight (be­cause it is more plau­si­ble in the face of the “prior” of life ex­pe­rience) than if they threat­ened to harm 100 peo­ple, which in turn ought to carry more weight than if they threat­ened to harm 3^^^3 peo­ple (I’m us­ing 3^^^3 here as a rep­re­sen­ta­tion of an uni­mag­in­ably huge num­ber).

The point at which a threat or pro­posal starts to be called “Pas­cal’s Mug­ging” can be thought of as the point at which the claimed value of Ac­tion A is wildly out­side the prior set by life ex­pe­rience (which may cause the feel­ing that com­mon sense is be­ing vi­o­lated). If some­one claims that giv­ing him/​her $10 will ac­com­plish 3^^^3 times as much as a 1-stan­dard-de­vi­a­tion life ac­tion from the ap­pro­pri­ate refer­ence class, then the ac­tual post-ad­just­ment ex­pected value of Ac­tion A will be just un­der (1/​3^^^3) (in stan­dard de­vi­a­tion terms) - only triv­ially higher than the value of an av­er­age ac­tion, and likely lower than other ac­tions one could take with the same re­sources. This is true with­out ap­ply­ing any par­tic­u­lar prob­a­bil­ity that the per­son’s frame­work is wrong—it is sim­ply a func­tion of the fact that their es­ti­mate has such enor­mous pos­si­ble er­ror. An un­grounded es­ti­mate mak­ing an ex­trav­a­gant claim ought to be more or less dis­carded in the face of the “prior dis­tri­bu­tion” of life ex­pe­rience.

Gen­er­al­iz­ing the Bayesian approach

In the above cases, I’ve given quan­tifi­ca­tions of (a) the ap­pro­pri­ate prior for cost-effec­tive­ness; (b) the strength/​con­fi­dence of a given cost-effec­tive­ness es­ti­mate. One needs to quan­tify both (a) and (b) - not just quan­tify es­ti­mated cost-effec­tive­ness—in or­der to for­mally make the needed Bayesian ad­just­ment to the ini­tial es­ti­mate.

But when it comes to giv­ing, and many other de­ci­sions, rea­son­able quan­tifi­ca­tion of these things usu­ally isn’t pos­si­ble. To have a prior, you need a refer­ence class, and refer­ence classes are de­bat­able.

It’s my view that my brain in­stinc­tively pro­cesses huge amounts of in­for­ma­tion, com­ing from many differ­ent refer­ence classes, and ar­rives at a prior; if I at­tempt to for­mal­ize my prior, count­ing only what I can name and jus­tify, I can worsen the ac­cu­racy a lot rel­a­tive to go­ing with my gut. Of course there is a prob­lem here: go­ing with one’s gut can be an ex­cuse for go­ing with what one wants to be­lieve, and a lot of what en­ters into my gut be­lief could be ir­rele­vant to proper Bayesian anal­y­sis. There is an ap­peal to for­mu­las, which is that they seem to be sus­cep­ti­ble to out­siders’ check­ing them for fair­ness and con­sis­tency.

But when the for­mu­las are too rough, I think the loss of ac­cu­racy out­weighs the gains to trans­parency. Rather than us­ing a for­mula that is check­able but omits a huge amount of in­for­ma­tion, I’d pre­fer to state my in­tu­ition—with­out pre­tense that it is any­thing but an in­tu­ition—and hope that the en­su­ing dis­cus­sion pro­vides the needed check on my in­tu­itions.

I can’t, there­fore, use­fully say what I think the ap­pro­pri­ate prior es­ti­mate of char­ity cost-effec­tive­ness is. I can, how­ever, de­scribe a cou­ple of ap­proaches to Bayesian ad­just­ments that I op­pose, and can de­scribe a few heuris­tics that I use to de­ter­mine whether I’m mak­ing an ap­pro­pri­ate Bayesian ad­just­ment.

Ap­proaches to Bayesian ad­just­ment that I oppose

I have seen some ar­gue along the lines of “I have a very weak (or un­in­for­ma­tive) prior, which means I can more or less take rough es­ti­mates liter­ally.” I think this is a mis­take. We do have a lot of in­for­ma­tion by which to judge what to ex­pect from an ac­tion (in­clud­ing a dona­tion), and failure to use all the in­for­ma­tion we have is a failure to make the ap­pro­pri­ate Bayesian ad­just­ment. Even just a sense for the val­ues of the small set of ac­tions you’ve taken in your life, and ob­served the con­se­quences of, gives you some­thing to work with as far as an “out­side view” and a start­ing prob­a­bil­ity dis­tri­bu­tion for the value of your ac­tions; this dis­tri­bu­tion prob­a­bly ought to have high var­i­ance, but when deal­ing with a rough es­ti­mate that has very high var­i­ance of its own, it may still be quite a mean­ingful prior.

I have seen some us­ing the EEV frame­work who can tell that their es­ti­mates seem too op­ti­mistic, so they make var­i­ous “down­ward ad­just­ments,” mul­ti­ply­ing their EEV by ap­par­ently ad hoc figures (1%, 10%, 20%). What isn’t clear is whether the size of the ad­just­ment they’re mak­ing has the cor­rect re­la­tion­ship to (a) the weak­ness of the es­ti­mate it­self (b) the strength of the prior (c) dis­tance of the es­ti­mate from the prior. An ex­am­ple of how this ap­proach can go astray can be seen in the “Pas­cal’s Mug­ging” anal­y­sis above: as­sign­ing one’s frame­work a 99.99% chance of be­ing to­tally wrong may seem to be am­ply con­ser­va­tive, but in fact the proper Bayesian ad­just­ment is much larger and leads to a com­pletely differ­ent con­clu­sion.

Heuris­tics I use to ad­dress whether I’m mak­ing an ap­pro­pri­ate prior-based adjustment

  • The more ac­tion is asked of me, the more ev­i­dence I re­quire. Any­time I’m asked to take a sig­nifi­cant ac­tion (giv­ing a sig­nifi­cant amount of money, time, effort, etc.), this ac­tion has to have higher ex­pected value than the ac­tion I would oth­er­wise take. My in­tu­itive feel for the dis­tri­bu­tion of “how much my ac­tions ac­com­plish” serves as a prior—an ad­just­ment to the value that the asker claims for my ac­tion.

  • I pay at­ten­tion to how much of the vari­a­tion I see be­tween es­ti­mates is likely to be driven by true vari­a­tion vs. es­ti­mate er­ror. As shown above, when an es­ti­mate is rough enough so that er­ror might ac­count for the bulk of the ob­served vari­a­tion, a proper Bayesian ap­proach can in­volve a mas­sive dis­count to the es­ti­mate.

  • I put much more weight on con­clu­sions that seem to be sup­ported by mul­ti­ple differ­ent lines of anal­y­sis, as un­re­lated to one an­other as pos­si­ble. If one starts with a high-er­ror es­ti­mate of ex­pected value, and then starts find­ing more es­ti­mates with the same mid­point, the var­i­ance of the ag­gre­gate es­ti­mate er­ror de­clines; the less cor­re­lated the es­ti­mates are, the greater the de­cline in the var­i­ance of the er­ror, and thus the lower the Bayesian ad­just­ment to the fi­nal es­ti­mate. This is a for­mal way of ob­serv­ing that “di­ver­sified” rea­sons for be­liev­ing some­thing lead to more “ro­bust” be­liefs, i.e., be­liefs that are less likely to fall apart with new in­for­ma­tion and can be used with less skep­ti­cism.

  • I am hes­i­tant to em­brace ar­gu­ments that seem to have anti-com­mon-sense im­pli­ca­tions (un­less the ev­i­dence be­hind these ar­gu­ments is strong) and I think my prior may of­ten be the rea­son for this. As seen above, a too-weak prior can lead to many seem­ingly ab­surd be­liefs and con­se­quences, such as fal­ling prey to “Pas­cal’s Mug­ging” and re­mov­ing the in­cen­tive for in­ves­ti­ga­tion of strong claims. Strength­en­ing the prior fixes these prob­lems (while over-strength­en­ing the prior re­sults in sim­ply ig­nor­ing new ev­i­dence). In gen­eral, I be­lieve that when a par­tic­u­lar kind of rea­son­ing seems to me to have anti-com­mon-sense im­pli­ca­tions, this may in­di­cate that its im­pli­ca­tions are well out­side my prior.

  • My prior for char­ity is gen­er­ally skep­ti­cal, as out­lined at this post. Giv­ing well seems con­cep­tu­ally quite difficult to me, and it’s been my ex­pe­rience over time that the more we dig on a cost-effec­tive­ness es­ti­mate, the more un­war­ranted op­ti­mism we un­cover. Also, hav­ing an op­ti­mistic prior would mean giv­ing to opaque char­i­ties, and that seems to vi­o­late com­mon sense. Thus, we look for char­i­ties with quite strong ev­i­dence of effec­tive­ness, and tend to pre­fer very strong char­i­ties with rea­son­ably high es­ti­mated cost-effec­tive­ness to weaker char­i­ties with very high es­ti­mated cost-effec­tive­ness


  • I feel that any giv­ing ap­proach that re­lies only on es­ti­mated ex­pected-value—and does not in­cor­po­rate prefer­ences for bet­ter-grounded es­ti­mates over shak­ier es­ti­mates—is flawed.

  • Thus, when aiming to max­i­mize ex­pected pos­i­tive im­pact, it is not ad­vis­able to make giv­ing de­ci­sions based fully on ex­plicit for­mu­las. Proper Bayesian ad­just­ments are im­por­tant and are usu­ally overly difficult to for­mal­ize.