Would some version of this still work if you have imprecise credences about the signs (and magnitudes) of considerations you’ll come up with, rather than 50-50 (or some other ratio)?
Even if not 50-50 but precise, we could adjust the donation amounts to match the probabilities and maintain the expected amount donated at $1000.
But if the probabilities are imprecise, I don’t think we can (precisely) maintain the expected donation amounts. We could pick donation amounts such that $1000, <$1000 and >$1000 are all possible expected donation amounts, in our set of credences (representor).
Cool direction and results!
Can we prevent such an agent from having a preference to create agents that do resist shutdown?
EDIT: And if they’re going to create agents anyway, actually make sure those agents don’t resist shutdown, too, rather than, say, being indifferent about whether those other agents resist shutdown.