I upvoted this for being an interesting and useful contribution.
However, I must object to the last sentence,
I feel the Oracle is a more sensible “not full FAI” approach to look into.
You consistently hold this position, and I have yet to be impressed by it! Oracle AIs are tool AGIs like the type described in this post (and will destroy the world), if they are capable of answering questions about which actions we should take. And if they aren’t, they aren’t general intelligences but rather domain-specific ones. And also, all the other reasons to think Oracle AI is an FAI-complete problem, which I needn’t list here as I’m sure you are familiar with them.
The premises of this post, that a tool AI would not develop a utility function, that we could understand the options presented, and that it would be safe to read the options, I think are all unrealistic (though the former is shakier and less well-founded, so I’m more likely to change that than the latter two). You’ve done a good job of demonstrating that granting all of these would still produce an unsafe thing, but I think you need to solve all the problems highlighted in this post, AND achieve those three prerequisites with an extreme degree of certainty. I honestly don’t think that can be done.
Oracle AIs are tool AGIs like the type described in this post
Maybe the most salient difference is that Oracles are known to be dangerous, and we can think about how to use them safely, wherease tools were presented as just being intrinsically safe by definition.
I upvoted this for being an interesting and useful contribution.
However, I must object to the last sentence,
You consistently hold this position, and I have yet to be impressed by it! Oracle AIs are tool AGIs like the type described in this post (and will destroy the world), if they are capable of answering questions about which actions we should take. And if they aren’t, they aren’t general intelligences but rather domain-specific ones. And also, all the other reasons to think Oracle AI is an FAI-complete problem, which I needn’t list here as I’m sure you are familiar with them.
The premises of this post, that a tool AI would not develop a utility function, that we could understand the options presented, and that it would be safe to read the options, I think are all unrealistic (though the former is shakier and less well-founded, so I’m more likely to change that than the latter two). You’ve done a good job of demonstrating that granting all of these would still produce an unsafe thing, but I think you need to solve all the problems highlighted in this post, AND achieve those three prerequisites with an extreme degree of certainty. I honestly don’t think that can be done.
Maybe the most salient difference is that Oracles are known to be dangerous, and we can think about how to use them safely, wherease tools were presented as just being intrinsically safe by definition.