Model Stability in Intervention Assessment

In this post, I hope to examine the Bayesian Adjustment paradigm presented by Holden Karnofsky of Givewell from a mathematical viewpoint, in particular looking at how we can rigorously manage the notion of uncertainty in our models and the stability of an estimate. Several recent posts have touched on related issues.

In practise, we will need to have some substantive prior on the likely range of impacts that interventions can achieve, and I will look briefly at what kinds of log-ranges are supported in the literature, and the extent to which these can preclude extreme impact scenarios. I will then briefly look at less formal notions of confidence in a model, which may be more tractable either computationally or for heuristic purposes than a formal bayesian approach.

Bayesian Adjustment, and the A_p distribution

In the setting originally proposed, the BA framework takes a background prior on impacts and a noisy measurement of fixed variance of a fixed impact parameter. In this setting, the BA approach is provably correct. Unfortunately, the real world is not so accommodating; for general evidence about an intervention, the BA approach is not fully Bayesian. In this sense it unavoidably miscounts evidence. The general problem can be illustrated by working through the process formally. Consider propositions:

x := Has Impact x,
E := Background data,
C := there exists a given computation or argument to a given impact y.

We suppose for the framework that we have P(x|E), P(x|C) for each x. Since the set of propositions {x} are disjoint and exhaustive, these form distributions. For inference, what we actually want is P(x|EC). In the BA framework, we compute P(x|E)P(x|C) for each x, and normalise to get a distribution. Computing a bayesian update, we have:

P(x|EC) = P(xEC)/P(EC) = P(C|xE)P(x|E)/P(C|E).

So if the BA framework is to give the correct answer, we need to have P(x|EC) ∝ P(x|E)P(x|C), so that the normalisation in the BA framework fixes everything correctly. Since P(C|E) is also just a normalising factor, this proportionality occurs if and only if P(C|xE) ∝ P(x|C), which does not hold in general. In the precise setting that was originally proposed for the BA framework, there are two special features. Firstly, the estimate is a noisy measurement of x, and so P(C|x) = P(C|xE) because all dependence on the world factors through x. Secondly P(C|x) ∝ P(x|C), and so the bayesian and BA results coincide.

However, when we investigate an indirect intervention we are typically looking at estimates derived non-trivially from the world; as a result, P(C|xE) ≠ P(C|x), and the BA framework breaks down. Put another way, when we look for estimates and find one, we have learned something about the world. If we don’t account for this properly, we will make incorrect conclusions.

In particular, it is reasonable to expect that the existence of estimates implying unusual values for an intervention should positively correlate with background states of the world which permit unusual values for the intervention. The BA framework does not account for this, and so heuristically it will overly penalise estimates of interventions which yield results far from the prior distribution. Of course, we can reasonably ask whether it is feasible to compute P(x|EC) explicitly; in general fully bayesian work is hard.

Jaynes (Probability Theory: The Logic of Science, Chapter 18) deals with a simpler example of the same basic problem, where we are asked to ascribe credence to a proposition like

A := “when I flip this coin it will come up heads”.

Instead of merely having a belief about the distribution over outcomes (analogous to P(x|E) in the BA case), it turns out to be necessary to keep track of a distribution over propositions of form:

A_p := “the subjective probability of Heads is p, regardless of any other evidence”;

or more formally we define P(A|A_pE) = p. Hence the events A_p are disjoint, and exactly one is true. Hence we have an object which behaves like a probability distribution over A_p; we can abuse terminology and use probability directly. Jaynes then shows that:

P(A) = ∫p P(A_p) dp

And so we can recover P(A) from the P(A_p). The full A_p distribution is needed to formalise confidence in one’s estimate. For example, if one is sure from background data E that the coin is completely biased, then one trial flip will tell you which way the coin is biased, and so P(A|E,F) will be almost 0 or 1, whilst P(A|E) = ½. On the other hand, if you have background information E’ that of 10000 trial flips 5000 were heads, then one additional trial flip F leaves P(A|E’F) ~ P(A|E) = ½. Jaynes shows that the A_p distribution screens off E, and can be updated in light of new data F; the posterior P(A|EF) is then the mean of the new A_p distribution. In this framework, and starting from a uniform prior over A_p, Laplace’s law of succession is derived.

To generalise this framework to estimating a real value x rather than a binary outcome A, we can shift from a distribution A_p over probabilities of A to a distribution P(X_d) over distributions for x, with X_d := “x ~ d regardless of other evidence”¹. In this setting, there will still be a “point estimate” distribution X, the mean of X_d, which summarises your current beliefs about x. Other information about X_d is needed to allow you to update coherently in response to arbitrary new information. In such a case, new information may cause one to substantially change the distribution X_d, and thus one’s beliefs about the world, if this new information causes a great deal of surprise conditional on X.

Examples and Priors in the BA framework

The mathematics can also reveal when an intuition pump is bringing extra information in a non-obvious way. For example, some of the examples given for how the BA framework should run had the apparently unintuitive feature that successively larger claims of impact eventually lead to decreasing posterior means to the estimates. This turns out to be because the standard deviation of the estimates was presumed to be roughly equal to their mean.

De facto this means that the new evidence was prohibited a prior from suggesting that an intervention was better than the prior mean with high probability. In general, this need not hold, if we are able to find data which is reasonably constrained and not present in the background model. If we intend to also account for the possibilities of errors in cognition, then this kind of treatment of new evidence seems more reasonable, but then we should see similar broadening in our background prior.

Similarly, as the stated BA priors are normal or log-normal, they assert that the event E := “the range of intervention impact ratios is large” has very low probability. Some decay is necessary to prevent arbitrarily large impacts dominating, which would make expected value computations fail to converge. Practically, this implies that a stated prior for impacts drops off faster than 1/impact³ above some impact², but this does not in and of itself mandate a specific form of prior, not specify the point above which the prior should drop rapidly, nor the absolute rate of the drop off. In particular, the log-normal or normal prior drop off much faster, and so are implicitly very confident that the range of impacts is bounded by what we’ve already seen.

What is the range of impacts for interventions?

It is not trivial to find out what kinds of ratios we should expect to see; for these purposes it is unfortunate that Givewell does not publicly emit $/DALY or $/life estimates of impact for the majority of the charities it assesses. It would be very useful to see what kinds of impacts are being sampled at the low end. Other studies (eg. DCP2) have assessed some hundreds of high and low impact interventions in public health, and assert 10000:1 ratios in impact, with their best $/DALY numbers consistent with Givewell’s assessment that AMF is likely to be one of the better public health interventions available.

Of course, we also strongly suspect that there exist interventions with better impacts than AMF, if we are willing to look outside public health. Givewell raison d’etre is that one can gain leverage in moving funds from ineffective causes to effective ones, and so a dollar spent on Givewell should move much more than a dollar to effective interventions. In principle this demonstrates that the range of possible intervention impacts may be much larger than the range available in specific fields, such as developing world health interventions.

By the lights of the BA prior, we are uncharitable about an estimate of impact if we assert it is large, in that this makes the estimate incredulous and thus heavily discounted. In this sense, existential risk reduction has been sketchily and optimistically estimated at around $0.125/life, which we can take as an uncharitable estimate for the BA framework. Assuming that this was a correct estimate, it being true would only require the existence of an intervention which is to AMF as AMF is to the least effective health interventions. It does not seem easy to confidently assert that the tail thickness and variance of the distribution of intervention impacts is such that the apparently observed ratios in public health interventions and Givewell are common enough that they can be searched for whilst ruling out a priori the credibility of estimates at the <$1/life level.

Now, it might be possible that these very high impact interventions are not easy to scale up, or are rare enough that it is not worth searching for them. On the other hand, we can free-ride on other people recommending interventions, if we are willing to accept internal or inside view assessments as substantively credible.

Confidence and Probability

It seems clear that the probability of a proposition and one’s confidence in the quality of your assessment are distinct, although it is easy to confuse language by referring to confidence in a proposition, rather than in a probability or estimate. Fully rigorously, this is encompassed in the distribution over X_d, but in practise we may wish to track only a single posterior distribution³.

Other commenters have suggested a similar distinction between confidence and probability; observing that having observed the computations exist the correct response is to say “I notice that I am confused”. More formally, in practise we have neither P(x|C) nor P(x|E). We have to also condition on some event like:

N := “My modelling and computations are correct”.

Ideally one would have extensive tests of all of the pieces of a methodology, so that one could say something about which classes of interventions are well modelled, but practically this may excessively complicate the issue. A priori, it seems unreasonable to attach >> 1-1/1000 probability to propositions like N for a new method or model which has merely been output by human cognition. Assessing high confidence would be expected to wait on assessing the reliability and calibration of the methodology, or showing that the model is a stable output of cognition.

In the event of a computation and a point prior belief about interventions disagreeing, a Bayesian update will reduce confidence in N, and also come to believe that the processes leading to the estimate C are less reliable. This is separate to the process which causes you to extract beliefs about this particular intervention. Whether the background model is substantively changed or the estimation procedure is discounted is a matter for your relative confidence in these processes, and the sensitivity of the outputs of the processes.

Conclusions

Disagreements over how to estimate the impact of an intervention on the world have existed for some time, and it seems that the grounds for these disagreements are not being well addressed. In general, it would be a good thing for our grounds for confidence in arguments and background priors to be made very explicit and open. In principle we can then reduce these disagreements to matters of fact and differences in prior beliefs.

In the particular case of Givewell, it is clear that they have assessed a great many interventions systematically, and seem to possess a great deal of confidence in their modelled backgrounds. I do not know if there has been a formal process of checking the calibration of these estimates; if there has been, and so Givewell can assess in high confidence (say » 10 bits) in propositions of form “our model is emitting a suitable background correct for this class of interventions”, then the methods are highly likely to be highly valuable to the wider EA community for other purposes, and ideally would be distributed.

Notes

I wrote this post whilst a visiting fellow at MIRI; Lukeprog asked that I take a further look at LW’s debates on cost effectiveness stability in effective altruism, and try to clarify the situation if possible.

I am grateful to Carl Shulman, Luke Muehlhauser and Adam Casey for their substantive feedback and comments on early drafts of this post.

1 To follow the modified mathematics of Jaynes’ derivation closely, we amend 18-1 to read P(X = x|X_dE) = d(x) for any distribution d, and then follow Jaynes’ derivation formally. It is reasonable to be worried that the space of distributions is not measurable; this can be fixed by restricting to a sigma-algebra of functions which are piecewise constant (or alternatively running Jaynes’ original approach on the set of binary propositions A_yz := “y ≤ x ≤ z” for all y and z)

2 We could also assert strong cancellation properties, but it is unclear whether these effects can be substantial in practise. Technically, we also could get convergence with drop offs like 1/(n² log² n) or 1/(n² log n log² log n), but the distinction is slight for the purposes of discussion; they are much slower than a normal.

3 If we work with the set of A_xy propositions instead, then Jaynes implies we have to hold a set of distributions (A_xy)_p, which is rather more tractable than X_d, although harder to visualise concretely.