Model Stability in Intervention Assessment

In this post, I hope to ex­am­ine the Bayesian Ad­just­ment paradigm pre­sented by Holden Karnofsky of Givewell from a math­e­mat­i­cal view­point, in par­tic­u­lar look­ing at how we can rigor­ously man­age the no­tion of un­cer­tainty in our mod­els and the sta­bil­ity of an es­ti­mate. Sev­eral re­cent posts have touched on re­lated is­sues.

In prac­tise, we will need to have some sub­stan­tive prior on the likely range of im­pacts that in­ter­ven­tions can achieve, and I will look briefly at what kinds of log-ranges are sup­ported in the liter­a­ture, and the ex­tent to which these can pre­clude ex­treme im­pact sce­nar­ios. I will then briefly look at less for­mal no­tions of con­fi­dence in a model, which may be more tractable ei­ther com­pu­ta­tion­ally or for heuris­tic pur­poses than a for­mal bayesian ap­proach.

Bayesian Ad­just­ment, and the Ap distribution

In the set­ting origi­nally pro­posed, the BA frame­work takes a back­ground prior on im­pacts and a noisy mea­sure­ment of fixed var­i­ance of a fixed im­pact pa­ram­e­ter. In this set­ting, the BA ap­proach is prov­ably cor­rect. Un­for­tu­nately, the real world is not so ac­com­mo­dat­ing; for gen­eral ev­i­dence about an in­ter­ven­tion, the BA ap­proach is not fully Bayesian. In this sense it un­avoid­ably mis­counts ev­i­dence. The gen­eral prob­lem can be illus­trated by work­ing through the pro­cess for­mally. Con­sider propo­si­tions:

x := Has Im­pact x,
E := Back­ground data,
C := there ex­ists a given com­pu­ta­tion or ar­gu­ment to a given im­pact y.

We sup­pose for the frame­work that we have P(x|E), P(x|C) for each x. Since the set of propo­si­tions {x} are dis­joint and ex­haus­tive, these form dis­tri­bu­tions. For in­fer­ence, what we ac­tu­ally want is P(x|EC). In the BA frame­work, we com­pute P(x|E)P(x|C) for each x, and nor­mal­ise to get a dis­tri­bu­tion. Com­put­ing a bayesian up­date, we have:

P(x|EC) = P(xEC)/​P(EC) = P(C|xE)P(x|E)/​P(C|E).

So if the BA frame­work is to give the cor­rect an­swer, we need to have P(x|EC) ∝ P(x|E)P(x|C), so that the nor­mal­i­sa­tion in the BA frame­work fixes ev­ery­thing cor­rectly. Since P(C|E) is also just a nor­mal­is­ing fac­tor, this pro­por­tion­al­ity oc­curs if and only if P(C|xE) ∝ P(x|C), which does not hold in gen­eral. In the pre­cise set­ting that was origi­nally pro­posed for the BA frame­work, there are two spe­cial fea­tures. Firstly, the es­ti­mate is a noisy mea­sure­ment of x, and so P(C|x) = P(C|xE) be­cause all de­pen­dence on the world fac­tors through x. Se­condly P(C|x) ∝ P(x|C), and so the bayesian and BA re­sults co­in­cide.

How­ever, when we in­ves­ti­gate an in­di­rect in­ter­ven­tion we are typ­i­cally look­ing at es­ti­mates de­rived non-triv­ially from the world; as a re­sult, P(C|xE) ≠ P(C|x), and the BA frame­work breaks down. Put an­other way, when we look for es­ti­mates and find one, we have learned some­thing about the world. If we don’t ac­count for this prop­erly, we will make in­cor­rect con­clu­sions.

In par­tic­u­lar, it is rea­son­able to ex­pect that the ex­is­tence of es­ti­mates im­ply­ing un­usual val­ues for an in­ter­ven­tion should pos­i­tively cor­re­late with back­ground states of the world which per­mit un­usual val­ues for the in­ter­ven­tion. The BA frame­work does not ac­count for this, and so heuris­ti­cally it will overly pe­nal­ise es­ti­mates of in­ter­ven­tions which yield re­sults far from the prior dis­tri­bu­tion. Of course, we can rea­son­ably ask whether it is fea­si­ble to com­pute P(x|EC) ex­plic­itly; in gen­eral fully bayesian work is hard.

Jaynes (Prob­a­bil­ity The­ory: The Logic of Science, Chap­ter 18) deals with a sim­pler ex­am­ple of the same ba­sic prob­lem, where we are asked to as­cribe cre­dence to a propo­si­tion like

A := “when I flip this coin it will come up heads”.

In­stead of merely hav­ing a be­lief about the dis­tri­bu­tion over out­comes (analo­gous to P(x|E) in the BA case), it turns out to be nec­es­sary to keep track of a dis­tri­bu­tion over propo­si­tions of form:

Ap := “the sub­jec­tive prob­a­bil­ity of Heads is p, re­gard­less of any other ev­i­dence”;

or more for­mally we define P(A|ApE) = p. Hence the events Ap are dis­joint, and ex­actly one is true. Hence we have an ob­ject which be­haves like a prob­a­bil­ity dis­tri­bu­tion over Ap; we can abuse ter­minol­ogy and use prob­a­bil­ity di­rectly. Jaynes then shows that:

P(A) = ∫p P(Ap) dp

And so we can re­cover P(A) from the P(Ap). The full Ap dis­tri­bu­tion is needed to for­mal­ise con­fi­dence in one’s es­ti­mate. For ex­am­ple, if one is sure from back­ground data E that the coin is com­pletely bi­ased, then one trial flip will tell you which way the coin is bi­ased, and so P(A|E,F) will be al­most 0 or 1, whilst P(A|E) = ½. On the other hand, if you have back­ground in­for­ma­tion E’ that of 10000 trial flips 5000 were heads, then one ad­di­tional trial flip F leaves P(A|E’F) ~ P(A|E) = ½. Jaynes shows that the Ap dis­tri­bu­tion screens off E, and can be up­dated in light of new data F; the pos­te­rior P(A|EF) is then the mean of the new Ap dis­tri­bu­tion. In this frame­work, and start­ing from a uniform prior over Ap, Laplace’s law of suc­ces­sion is de­rived.

To gen­er­al­ise this frame­work to es­ti­mat­ing a real value x rather than a bi­nary out­come A, we can shift from a dis­tri­bu­tion Ap over prob­a­bil­ities of A to a dis­tri­bu­tion P(Xd) over dis­tri­bu­tions for x, with Xd := “x ~ d re­gard­less of other ev­i­dence”1. In this set­ting, there will still be a “point es­ti­mate” dis­tri­bu­tion X, the mean of Xd, which sum­marises your cur­rent be­liefs about x. Other in­for­ma­tion about Xd is needed to al­low you to up­date co­her­ently in re­sponse to ar­bi­trary new in­for­ma­tion. In such a case, new in­for­ma­tion may cause one to sub­stan­tially change the dis­tri­bu­tion Xd, and thus one’s be­liefs about the world, if this new in­for­ma­tion causes a great deal of sur­prise con­di­tional on X.

Ex­am­ples and Pri­ors in the BA framework

The math­e­mat­ics can also re­veal when an in­tu­ition pump is bring­ing ex­tra in­for­ma­tion in a non-ob­vi­ous way. For ex­am­ple, some of the ex­am­ples given for how the BA frame­work should run had the ap­par­ently un­in­tu­itive fea­ture that suc­ces­sively larger claims of im­pact even­tu­ally lead to de­creas­ing pos­te­rior means to the es­ti­mates. This turns out to be be­cause the stan­dard de­vi­a­tion of the es­ti­mates was pre­sumed to be roughly equal to their mean.

De facto this means that the new ev­i­dence was pro­hibited a prior from sug­gest­ing that an in­ter­ven­tion was bet­ter than the prior mean with high prob­a­bil­ity. In gen­eral, this need not hold, if we are able to find data which is rea­son­ably con­strained and not pre­sent in the back­ground model. If we in­tend to also ac­count for the pos­si­bil­ities of er­rors in cog­ni­tion, then this kind of treat­ment of new ev­i­dence seems more rea­son­able, but then we should see similar broad­en­ing in our back­ground prior.

Similarly, as the stated BA pri­ors are nor­mal or log-nor­mal, they as­sert that the event E := “the range of in­ter­ven­tion im­pact ra­tios is large” has very low prob­a­bil­ity. Some de­cay is nec­es­sary to pre­vent ar­bi­trar­ily large im­pacts dom­i­nat­ing, which would make ex­pected value com­pu­ta­tions fail to con­verge. Prac­ti­cally, this im­plies that a stated prior for im­pacts drops off faster than 1/​im­pact³ above some im­pact2, but this does not in and of it­self man­date a spe­cific form of prior, not spec­ify the point above which the prior should drop rapidly, nor the ab­solute rate of the drop off. In par­tic­u­lar, the log-nor­mal or nor­mal prior drop off much faster, and so are im­plic­itly very con­fi­dent that the range of im­pacts is bounded by what we’ve already seen.

What is the range of im­pacts for in­ter­ven­tions?

It is not triv­ial to find out what kinds of ra­tios we should ex­pect to see; for these pur­poses it is un­for­tu­nate that Givewell does not pub­li­cly emit $/​DALY or $/​life es­ti­mates of im­pact for the ma­jor­ity of the char­i­ties it as­sesses. It would be very use­ful to see what kinds of im­pacts are be­ing sam­pled at the low end. Other stud­ies (eg. DCP2) have as­sessed some hun­dreds of high and low im­pact in­ter­ven­tions in pub­lic health, and as­sert 10000:1 ra­tios in im­pact, with their best $/​DALY num­bers con­sis­tent with Givewell’s as­sess­ment that AMF is likely to be one of the bet­ter pub­lic health in­ter­ven­tions available.

Of course, we also strongly sus­pect that there ex­ist in­ter­ven­tions with bet­ter im­pacts than AMF, if we are will­ing to look out­side pub­lic health. Givewell rai­son d’etre is that one can gain lev­er­age in mov­ing funds from in­effec­tive causes to effec­tive ones, and so a dol­lar spent on Givewell should move much more than a dol­lar to effec­tive in­ter­ven­tions. In prin­ci­ple this demon­strates that the range of pos­si­ble in­ter­ven­tion im­pacts may be much larger than the range available in spe­cific fields, such as de­vel­op­ing world health in­ter­ven­tions.

By the lights of the BA prior, we are un­char­i­ta­ble about an es­ti­mate of im­pact if we as­sert it is large, in that this makes the es­ti­mate in­cre­d­u­lous and thus heav­ily dis­counted. In this sense, ex­is­ten­tial risk re­duc­tion has been sketchily and op­ti­misti­cally es­ti­mated at around $0.125/​life, which we can take as an un­char­i­ta­ble es­ti­mate for the BA frame­work. As­sum­ing that this was a cor­rect es­ti­mate, it be­ing true would only re­quire the ex­is­tence of an in­ter­ven­tion which is to AMF as AMF is to the least effec­tive health in­ter­ven­tions. It does not seem easy to con­fi­dently as­sert that the tail thick­ness and var­i­ance of the dis­tri­bu­tion of in­ter­ven­tion im­pacts is such that the ap­par­ently ob­served ra­tios in pub­lic health in­ter­ven­tions and Givewell are com­mon enough that they can be searched for whilst rul­ing out a pri­ori the cred­i­bil­ity of es­ti­mates at the <$1/​life level.

Now, it might be pos­si­ble that these very high im­pact in­ter­ven­tions are not easy to scale up, or are rare enough that it is not worth search­ing for them. On the other hand, we can free-ride on other peo­ple recom­mend­ing in­ter­ven­tions, if we are will­ing to ac­cept in­ter­nal or in­side view as­sess­ments as sub­stan­tively cred­ible.

Con­fi­dence and Probability

It seems clear that the prob­a­bil­ity of a propo­si­tion and one’s con­fi­dence in the qual­ity of your as­sess­ment are dis­tinct, al­though it is easy to con­fuse lan­guage by refer­ring to con­fi­dence in a propo­si­tion, rather than in a prob­a­bil­ity or es­ti­mate. Fully rigor­ously, this is en­com­passed in the dis­tri­bu­tion over Xd, but in prac­tise we may wish to track only a sin­gle pos­te­rior dis­tri­bu­tion3.

Other com­menters have sug­gested a similar dis­tinc­tion be­tween con­fi­dence and prob­a­bil­ity; ob­serv­ing that hav­ing ob­served the com­pu­ta­tions ex­ist the cor­rect re­sponse is to say “I no­tice that I am con­fused”. More for­mally, in prac­tise we have nei­ther P(x|C) nor P(x|E). We have to also con­di­tion on some event like:

N := “My mod­el­ling and com­pu­ta­tions are cor­rect”.

Ideally one would have ex­ten­sive tests of all of the pieces of a method­ol­ogy, so that one could say some­thing about which classes of in­ter­ven­tions are well mod­el­led, but prac­ti­cally this may ex­ces­sively com­pli­cate the is­sue. A pri­ori, it seems un­rea­son­able to at­tach >> 1-1/​1000 prob­a­bil­ity to propo­si­tions like N for a new method or model which has merely been out­put by hu­man cog­ni­tion. Assess­ing high con­fi­dence would be ex­pected to wait on as­sess­ing the re­li­a­bil­ity and cal­ibra­tion of the method­ol­ogy, or show­ing that the model is a sta­ble out­put of cog­ni­tion.

In the event of a com­pu­ta­tion and a point prior be­lief about in­ter­ven­tions dis­agree­ing, a Bayesian up­date will re­duce con­fi­dence in N, and also come to be­lieve that the pro­cesses lead­ing to the es­ti­mate C are less re­li­able. This is sep­a­rate to the pro­cess which causes you to ex­tract be­liefs about this par­tic­u­lar in­ter­ven­tion. Whether the back­ground model is sub­stan­tively changed or the es­ti­ma­tion pro­ce­dure is dis­counted is a mat­ter for your rel­a­tive con­fi­dence in these pro­cesses, and the sen­si­tivity of the out­puts of the pro­cesses.


Disagree­ments over how to es­ti­mate the im­pact of an in­ter­ven­tion on the world have ex­isted for some time, and it seems that the grounds for these dis­agree­ments are not be­ing well ad­dressed. In gen­eral, it would be a good thing for our grounds for con­fi­dence in ar­gu­ments and back­ground pri­ors to be made very ex­plicit and open. In prin­ci­ple we can then re­duce these dis­agree­ments to mat­ters of fact and differ­ences in prior be­liefs.

In the par­tic­u­lar case of Givewell, it is clear that they have as­sessed a great many in­ter­ven­tions sys­tem­at­i­cally, and seem to pos­sess a great deal of con­fi­dence in their mod­el­led back­grounds. I do not know if there has been a for­mal pro­cess of check­ing the cal­ibra­tion of these es­ti­mates; if there has been, and so Givewell can as­sess in high con­fi­dence (say » 10 bits) in propo­si­tions of form “our model is emit­ting a suit­able back­ground cor­rect for this class of in­ter­ven­tions”, then the meth­ods are highly likely to be highly valuable to the wider EA com­mu­nity for other pur­poses, and ideally would be dis­tributed.


I wrote this post whilst a vis­it­ing fel­low at MIRI; Luke­prog asked that I take a fur­ther look at LW’s de­bates on cost effec­tive­ness sta­bil­ity in effec­tive al­tru­ism, and try to clar­ify the situ­a­tion if pos­si­ble.

I am grate­ful to Carl Shul­man, Luke Muehlhauser and Adam Casey for their sub­stan­tive feed­back and com­ments on early drafts of this post.

1 To fol­low the mod­ified math­e­mat­ics of Jaynes’ deriva­tion closely, we amend 18-1 to read P(X = x|XdE) = d(x) for any dis­tri­bu­tion d, and then fol­low Jaynes’ deriva­tion for­mally. It is rea­son­able to be wor­ried that the space of dis­tri­bu­tions is not mea­surable; this can be fixed by re­strict­ing to a sigma-alge­bra of func­tions which are piece­wise con­stant (or al­ter­na­tively run­ning Jaynes’ origi­nal ap­proach on the set of bi­nary propo­si­tions Ayz := “y ≤ x ≤ z” for all y and z)

2 We could also as­sert strong can­cel­la­tion prop­er­ties, but it is un­clear whether these effects can be sub­stan­tial in prac­tise. Tech­ni­cally, we also could get con­ver­gence with drop offs like 1/​(n² log² n) or 1/​(n² log n log² log n), but the dis­tinc­tion is slight for the pur­poses of dis­cus­sion; they are much slower than a nor­mal.

3 If we work with the set of Axy propo­si­tions in­stead, then Jaynes im­plies we have to hold a set of dis­tri­bu­tions (Axy)p, which is rather more tractable than Xd, al­though harder to vi­su­al­ise con­cretely.