Thanks again for your reply. I see your point that the world is complicated and a utility maximizer would be dangerous, even if the maximization is supposedly trivial. However, I don’t see how an achievable goal has the same problem. If my AI finds the answer of 2 before a meteor hits it, I would say it has solidly landed at 100% and stops doing anything. Your argument would be true if it decides to rule out all possible risks first, before actually starting to look for the answer of the question, which would otherwise quickly be found. But since ruling out those risks would be much harder to achieve than finding the answer, I can’t see my little agent doing that.
I think my easy goals come closest to what you call other-izers. Any more pointers for me to find that literature?
Thanks for your help, it helps me to calibrate my thoughts for sure!
I think actually 1+1 = ? is not really an easy enough goal, since it’s not 100% sure that the answer is 2. Getting to 100% certainty (including what I actually meant with that question) could still be nontrivial. But let’s say the goal is ‘delete filename.txt’? Could be the trick is in the language..
Thanks again for your reply. I see your point that the world is complicated and a utility maximizer would be dangerous, even if the maximization is supposedly trivial. However, I don’t see how an achievable goal has the same problem. If my AI finds the answer of 2 before a meteor hits it, I would say it has solidly landed at 100% and stops doing anything. Your argument would be true if it decides to rule out all possible risks first, before actually starting to look for the answer of the question, which would otherwise quickly be found. But since ruling out those risks would be much harder to achieve than finding the answer, I can’t see my little agent doing that.
I think my easy goals come closest to what you call other-izers. Any more pointers for me to find that literature?
Thanks for your help, it helps me to calibrate my thoughts for sure!
I think actually 1+1 = ? is not really an easy enough goal, since it’s not 100% sure that the answer is 2. Getting to 100% certainty (including what I actually meant with that question) could still be nontrivial. But let’s say the goal is ‘delete filename.txt’? Could be the trick is in the language..