You don’t build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it’s three hands worth of fingers in a log scale based on two hands worth of fingers.
Whether or not your utility is dominated by survival of humanity is an individual question.
If it isn’t, how do you expect the agent to actually stick to such a budget?
Not at all. A risk budget is decreased by your best estimate of your total risk “emission”, which is what fraction of the future multiverse (weighted by probability) you spoiled.
I understood your proposal. My point is that it doesn’t carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.
What you’re proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times—playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).
Which suggests a better way to point out what I want to point out—you’re subtracting probabilities when it makes sense to multiply probabilities. You’re penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7⁄6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it’s three hands worth of fingers in a log scale based on two hands worth of fingers.
Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).
If it isn’t, how do you expect the agent to actually stick to such a budget?
I hope that survival of humanity dominates the utility function of people who build AI, and they will do their best to carry it over to the AI. You can individually have another utility function, if it serves you well in your life. (As long as you won’t build any AIs). But that was a wrong way to answer your previous point:
One, it looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer, and being more or less cautious than that suggests is a mistake (at least, of how the utility is measured).
Not in case of multiple agents, who cannot easily coordinate. E.g. what if each human’s utility function makes it look reasonable to have a 1/1000 risk of destroying the world, for potential huge personal gains?
If I view the seven pulls as independent events, it depletes my budget by 7⁄6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
I am well aware of this, but the effect is negligible if we speak of small probabilities.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it’s three hands worth of fingers in a log scale based on two hands worth of fingers.
If it isn’t, how do you expect the agent to actually stick to such a budget?
I understood your proposal. My point is that it doesn’t carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.
What you’re proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times—playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).
Which suggests a better way to point out what I want to point out—you’re subtracting probabilities when it makes sense to multiply probabilities. You’re penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7⁄6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).
I hope that survival of humanity dominates the utility function of people who build AI, and they will do their best to carry it over to the AI. You can individually have another utility function, if it serves you well in your life. (As long as you won’t build any AIs). But that was a wrong way to answer your previous point:
Not in case of multiple agents, who cannot easily coordinate. E.g. what if each human’s utility function makes it look reasonable to have a 1/1000 risk of destroying the world, for potential huge personal gains?
I am well aware of this, but the effect is negligible if we speak of small probabilities.