Suppose we made a charity evaluator based on Statistical Prediction Rules, which perform pretty well. There is an issue though. The charities will try to fake the signals that SPR evaluates. SPR is too crude to resist deliberate cheating. Diversification then decreases payoff for such cheating; sufficient diversification can make it economically non viable for selfish parties to fake the signals. Same goes for any imperfect evaluation scheme, especially for elaborate processing of the information (statements, explanations, suggestions how to perform evaluation, et cetera) originating from the donation recipient.
You just can not abstract the imperfect evaluation as ‘uncertainty’ any more than you can abstract a backdoor in a server application as noise in the wire.
Diversification then decreases payoff for such cheating; sufficient diversification can make it economically non viable for selfish parties to fake the signals.
Diversification reduces the payoff for appearing better. Therefore it reduces the payoff of investing in fake signals of being better. But it also reduces the payoff of investments in actually being better! If a new project would increase humanitarian impact increases donations enough, then charities can afford to expand those efforts. If donations are insensitive to improvement, then the new project will be unaffordable.
Thus, e.g. GiveWell overwhelmingly channels funding to its top pick at a given time, partly to increase the expected direct benefit, and partly because they think that this creates incentives for improvement that dominate incentives for fake improvement. If the evaluation methods are worth using, they will include various signals that are costlier to fake than to honestly signal.
In the limit, if donors ignored quality indicators, spreading donations evenly among all charities, all this would do is incentivize the formation of lots of tiny charities that don’t do anything at all, just collect most of the diversification donations. If you can’t distinguish good from bad, you should focus on improving your ability to distinguish between them, not blindly diversify.
Diversification reduces the payoff for appearing better. Therefore it reduces the payoff of investing in fake signals of being better. But it also reduces the payoff of investments in actually being better!
Good charities are motivated by their objective. It’s rather bad charities for which actually performing better is simply one of the means to looking good for sake of some entirely different terminal goal. You are correct about the latter.
If you can’t distinguish good from bad, you should focus on improving your ability to distinguish between them, not blindly diversify.
I do concede that under unusually careful and secure (in the software security sense) evaluation it may be sufficiently resistant to cheating.
However, if you were parsing potentially turing-complete statements by the possible charity, verified the statement for approximate internal consistency, and then as a result of this clearly insecure process obtained enormously high number of, say, 8 lives per dollar, that’s an entirely different story. If your evaluation process got security hole, the largest number that falls through will be scam.
edit:
In the limit, if donors ignored quality indicators, spreading donations evenly among all charities, all this would do is incentivize the formation of lots of tiny charities that don’t do anything at all, just collect most of the diversification donations. If you can’t distinguish good from bad, you should focus on improving your ability to distinguish between them, not blindly diversify.
Wrong limit. The optimum amount of diversification is dependent to how secure is the evaluation process (how expensive it is for someone to generate a ‘donation basilisk’ output, which, upon reading, compels the reader to donate). Yes, ideally you should entirely eliminate the possibility of such ‘donation basilisk’ data, and then donate to the top charity. Practically, the degree of basilisk-proofness is a given that is very difficult to change, and you are making donation decision in the now.
Diversification then decreases payoff for such cheating; sufficient diversification can make it economically non viable for selfish parties to fake the signals.
I agree with the arguments against diversification (mainly due to its effect on lowering the incentive for becoming more efficient), but here’s a concrete instance of how diversification could make cheating nonviable.
Example: Cheating to fake the signals costs 5,000$ (in other words, 5,000$ to make it look like you’re the best charity). There are 10,000$ of efficient altruism funds that will be directed to the most efficient charity. By faking signals, you net 5,000$.
Now if diversification is used, let’s say at most 1⁄4 of the efficient altruism funds will be directed to a given charity (maybe evenly splitting the funds among the top 4 charities). Faking the signals now nets −2,500$. Thus, diversification would lower the incentive to cheat by reducing the expected payoff.
Suppose we made a charity evaluator based on Statistical Prediction Rules, which perform pretty well.
Is that just vanilla linear regression?
Diversification then decreases payoff for such cheating; sufficient diversification can make it economically non viable for selfish parties to fake the signals.
Even without cheating, evaluation is still problematic:
Suppose you have a formula that computes the expected marginal welfare (QUALYs, etc.) of a charity given a set of observable variables. You run it on a set of charities and it the two top charities get a very close score, one slightly greater than the other. But the input variables all affected by noise, and the formula contains several approximations, so you perform error propagation analysis and it turns out that the difference between these scores is within the margin of error.
Should you still donate everything to the top scoring charity even if you know that the decision is likely based on noise?
Should you still donate everything to the top scoring charity even if you know that the decision is likely based on noise?
If the charities are this close then you only expect to do very slightly better by giving only to the better scoring one. So it doesn’t matter much whether you give to one, the other, or both.
Ideally, you run your charity-evaluator function on huge selection of charities, and the one for which your charity-evaluator function gives the largest value, is in some sense the best, regardless of the noise.
More practically, imagine an imperfect evaluation function that due to a bug in it’s implementation multiplies by a Very Huge Number value of a charity whose description includes some string S which the evaluation function mis-processes in some dramatic way. Now, if the selection of charities is sufficiently big as to include at least one charity with such S in it’s description, you are essentially donating at random. Or worse than random, because the people that run in their head the computation resulting in production of such S tend to not be the ones you can trust.
Normally, I would expect people who know about human biases to not assume that evaluation would resemble the ideal and to understand that the output of some approximate evaluation will not have the exact properties of expected value.
There’s additional issue concerning imperfect evaluation.
Suppose we made a charity evaluator based on Statistical Prediction Rules, which perform pretty well. There is an issue though. The charities will try to fake the signals that SPR evaluates. SPR is too crude to resist deliberate cheating. Diversification then decreases payoff for such cheating; sufficient diversification can make it economically non viable for selfish parties to fake the signals. Same goes for any imperfect evaluation scheme, especially for elaborate processing of the information (statements, explanations, suggestions how to perform evaluation, et cetera) originating from the donation recipient.
You just can not abstract the imperfect evaluation as ‘uncertainty’ any more than you can abstract a backdoor in a server application as noise in the wire.
Diversification reduces the payoff for appearing better. Therefore it reduces the payoff of investing in fake signals of being better. But it also reduces the payoff of investments in actually being better! If a new project would increase humanitarian impact increases donations enough, then charities can afford to expand those efforts. If donations are insensitive to improvement, then the new project will be unaffordable.
Thus, e.g. GiveWell overwhelmingly channels funding to its top pick at a given time, partly to increase the expected direct benefit, and partly because they think that this creates incentives for improvement that dominate incentives for fake improvement. If the evaluation methods are worth using, they will include various signals that are costlier to fake than to honestly signal.
In the limit, if donors ignored quality indicators, spreading donations evenly among all charities, all this would do is incentivize the formation of lots of tiny charities that don’t do anything at all, just collect most of the diversification donations. If you can’t distinguish good from bad, you should focus on improving your ability to distinguish between them, not blindly diversify.
Good charities are motivated by their objective. It’s rather bad charities for which actually performing better is simply one of the means to looking good for sake of some entirely different terminal goal. You are correct about the latter.
I do concede that under unusually careful and secure (in the software security sense) evaluation it may be sufficiently resistant to cheating.
However, if you were parsing potentially turing-complete statements by the possible charity, verified the statement for approximate internal consistency, and then as a result of this clearly insecure process obtained enormously high number of, say, 8 lives per dollar, that’s an entirely different story. If your evaluation process got security hole, the largest number that falls through will be scam.
edit:
Wrong limit. The optimum amount of diversification is dependent to how secure is the evaluation process (how expensive it is for someone to generate a ‘donation basilisk’ output, which, upon reading, compels the reader to donate). Yes, ideally you should entirely eliminate the possibility of such ‘donation basilisk’ data, and then donate to the top charity. Practically, the degree of basilisk-proofness is a given that is very difficult to change, and you are making donation decision in the now.
I find this nonobvious; could you elaborate?
I agree with the arguments against diversification (mainly due to its effect on lowering the incentive for becoming more efficient), but here’s a concrete instance of how diversification could make cheating nonviable.
Example: Cheating to fake the signals costs 5,000$ (in other words, 5,000$ to make it look like you’re the best charity). There are 10,000$ of efficient altruism funds that will be directed to the most efficient charity. By faking signals, you net 5,000$.
Now if diversification is used, let’s say at most 1⁄4 of the efficient altruism funds will be directed to a given charity (maybe evenly splitting the funds among the top 4 charities). Faking the signals now nets −2,500$. Thus, diversification would lower the incentive to cheat by reducing the expected payoff.
Is that just vanilla linear regression?
Even without cheating, evaluation is still problematic:
Suppose you have a formula that computes the expected marginal welfare (QUALYs, etc.) of a charity given a set of observable variables. You run it on a set of charities and it the two top charities get a very close score, one slightly greater than the other. But the input variables all affected by noise, and the formula contains several approximations, so you perform error propagation analysis and it turns out that the difference between these scores is within the margin of error. Should you still donate everything to the top scoring charity even if you know that the decision is likely based on noise?
If the charities are this close then you only expect to do very slightly better by giving only to the better scoring one. So it doesn’t matter much whether you give to one, the other, or both.
Systematic errors are the problem.
Ideally, you run your charity-evaluator function on huge selection of charities, and the one for which your charity-evaluator function gives the largest value, is in some sense the best, regardless of the noise.
More practically, imagine an imperfect evaluation function that due to a bug in it’s implementation multiplies by a Very Huge Number value of a charity whose description includes some string S which the evaluation function mis-processes in some dramatic way. Now, if the selection of charities is sufficiently big as to include at least one charity with such S in it’s description, you are essentially donating at random. Or worse than random, because the people that run in their head the computation resulting in production of such S tend to not be the ones you can trust.
Normally, I would expect people who know about human biases to not assume that evaluation would resemble the ideal and to understand that the output of some approximate evaluation will not have the exact properties of expected value.