Would the high-risk, short-term model of a low-risk, long-term model equivilent be a useful predictive comparison? Perhaps merely that change could alter the friendliness? Perhaps the uselessness of such a short-term AI would render the candidates useless for functional rather than ethical reasons?
Could the types of tests you propose be based by many types of unfriendly AIs? For example, a paperclip maximiser might not press the “thermonuclear destruction” button because it might destroy some of the world’s paperclips?
I’m having trouble imagining how risk-aversion/appreciation might actually work in practice. Wouldn’t an AI always be selecting an optimal route regardless of risk, unless it actually had some kind of love of suboptimal options? In other words, other goals?
Provided your other assumptions are all true, perhaps simply the discount rate would be enough?
Great idea! A couple of thoughts/questions:
Would the high-risk, short-term model of a low-risk, long-term model equivilent be a useful predictive comparison? Perhaps merely that change could alter the friendliness? Perhaps the uselessness of such a short-term AI would render the candidates useless for functional rather than ethical reasons?
Could the types of tests you propose be based by many types of unfriendly AIs? For example, a paperclip maximiser might not press the “thermonuclear destruction” button because it might destroy some of the world’s paperclips?
I’m having trouble imagining how risk-aversion/appreciation might actually work in practice. Wouldn’t an AI always be selecting an optimal route regardless of risk, unless it actually had some kind of love of suboptimal options? In other words, other goals?
Provided your other assumptions are all true, perhaps simply the discount rate would be enough?