This may have an obvious response, but I can’t quite see it: If the worst possible thing is a negligible change, an easily achievable state, shouldn’t an AGI want to work to prevent that catastrophic risk? Couldn’t this cause terribly conflicting priorities?
If there is a minor thing that the AGI despises above all, surely some joker will make a point of trying to see what happens when they instruct their local copy of Marsupial-51B to perform the random inconsequential action.
It might be tempting to try to compromise on utopia to avoid a strong risk of the literal worst possible thing.
Apologies if there’s a reason why this is obviously not a concern :)
Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.
This may have an obvious response, but I can’t quite see it: If the worst possible thing is a negligible change, an easily achievable state, shouldn’t an AGI want to work to prevent that catastrophic risk? Couldn’t this cause terribly conflicting priorities?
If there is a minor thing that the AGI despises above all, surely some joker will make a point of trying to see what happens when they instruct their local copy of Marsupial-51B to perform the random inconsequential action.
It might be tempting to try to compromise on utopia to avoid a strong risk of the literal worst possible thing.
Apologies if there’s a reason why this is obviously not a concern :)
We’d want to pick something to
have badness per unit of resources (or opportunity cost) only moderately higher than any actually bad thing according to the surrogate,
scale like actually bad things according to the surrogate, and
be extraordinarily unlikely to occur otherwise.
Maybe something like doing some very specific computations, or building very specific objects.
Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.