I do not argue that my idea is sane; however I think your critique doesn’t do it justice. So let me briefly point out that:
measuring probabilities of world destruction is very hard; being able to measure them at the 1e-12 level seems very, very hard
It’s enough to use upper bounds. If we have e.g. an additional module to check our AI source code for errors, and such a module decreases probability of one of the bits being flipped, we can use our risk budget to calculate how many modules at minimum we need. Etc.
How should it decide what budget level to give itself?
It doesn’t. You don’t build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.
looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer
If utility is dominated by survival of humanity, then simple utility maximization is exactly the same as reducing total “existential risk emissions” in the sense I want to use them above.
Whether or not your utility is dominated by survival of humanity is an individual question.
the budget replenishes
Not at all. A risk budget is decreased by your best estimate of your total risk “emission”, which is what fraction of the future multiverse (weighted by probability) you spoiled.
So I think budgets are the wrong way to think about this—they rely too heavily on subjective perceptions of risk, they encourage being too cautious (or too risky) instead of seeing tail risks as linear in probability, and they don’t update on survival when they should.
Quite likely they are—but probably not for these reasons.
You don’t build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it’s three hands worth of fingers in a log scale based on two hands worth of fingers.
Whether or not your utility is dominated by survival of humanity is an individual question.
If it isn’t, how do you expect the agent to actually stick to such a budget?
Not at all. A risk budget is decreased by your best estimate of your total risk “emission”, which is what fraction of the future multiverse (weighted by probability) you spoiled.
I understood your proposal. My point is that it doesn’t carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.
What you’re proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times—playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).
Which suggests a better way to point out what I want to point out—you’re subtracting probabilities when it makes sense to multiply probabilities. You’re penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7⁄6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it’s three hands worth of fingers in a log scale based on two hands worth of fingers.
Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).
If it isn’t, how do you expect the agent to actually stick to such a budget?
I hope that survival of humanity dominates the utility function of people who build AI, and they will do their best to carry it over to the AI. You can individually have another utility function, if it serves you well in your life. (As long as you won’t build any AIs). But that was a wrong way to answer your previous point:
One, it looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer, and being more or less cautious than that suggests is a mistake (at least, of how the utility is measured).
Not in case of multiple agents, who cannot easily coordinate. E.g. what if each human’s utility function makes it look reasonable to have a 1/1000 risk of destroying the world, for potential huge personal gains?
If I view the seven pulls as independent events, it depletes my budget by 7⁄6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
I am well aware of this, but the effect is negligible if we speak of small probabilities.
I do not argue that my idea is sane; however I think your critique doesn’t do it justice. So let me briefly point out that:
It’s enough to use upper bounds. If we have e.g. an additional module to check our AI source code for errors, and such a module decreases probability of one of the bits being flipped, we can use our risk budget to calculate how many modules at minimum we need. Etc.
It doesn’t. You don’t build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.
If utility is dominated by survival of humanity, then simple utility maximization is exactly the same as reducing total “existential risk emissions” in the sense I want to use them above.
Whether or not your utility is dominated by survival of humanity is an individual question.
Not at all. A risk budget is decreased by your best estimate of your total risk “emission”, which is what fraction of the future multiverse (weighted by probability) you spoiled.
Quite likely they are—but probably not for these reasons.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it’s three hands worth of fingers in a log scale based on two hands worth of fingers.
If it isn’t, how do you expect the agent to actually stick to such a budget?
I understood your proposal. My point is that it doesn’t carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.
What you’re proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times—playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).
Which suggests a better way to point out what I want to point out—you’re subtracting probabilities when it makes sense to multiply probabilities. You’re penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7⁄6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).
I hope that survival of humanity dominates the utility function of people who build AI, and they will do their best to carry it over to the AI. You can individually have another utility function, if it serves you well in your life. (As long as you won’t build any AIs). But that was a wrong way to answer your previous point:
Not in case of multiple agents, who cannot easily coordinate. E.g. what if each human’s utility function makes it look reasonable to have a 1/1000 risk of destroying the world, for potential huge personal gains?
I am well aware of this, but the effect is negligible if we speak of small probabilities.