I hope AGI’s will be equipped with as many fail-safes as your argument rests on assumptions.
Fail safes would be low cost: if it can’t think of a way to beat them, it isn’t the bootstrapping AI we were hoping for anyway, and might even be harmful, so it would be good to have the fail-safes.
I just don’t see how one could be sophisticated enough to create a properly designed AGI capable of explosive recursive self-improvement and yet fail drastically on its scope boundaries.
It seems to me evolution based algorithms could do the trick.
What is the difference between “a rule” and “what it wants”. You seem to assume that it cares to follow a rule to maximize a reward number but doesn’t care to follow another rule that tells it to hold.
Who says it wants to want what it wants? I don’t want to want what I want.
Fail safes would be low cost: if it can’t think of a way to beat them, it isn’t the bootstrapping AI we were hoping for anyway, and might even be harmful, so it would be good to have the fail-safes.
It seems to me evolution based algorithms could do the trick.
Who says it wants to want what it wants? I don’t want to want what I want.