It’s easy to contrive situations where a self-modifying AI would choose not to keep the goals programmed into it, even without precommitment issues. Just contrive the circumstances so it gets paid to change. Unless there’s something wrong with the argument there, TDT etc. won’t be enough to ensure that the goals are kept.
It’s easy to contrive situations where a self-modifying AI would choose not to keep the goals programmed into it, even without precommitment issues. Just contrive the circumstances so it gets paid to change. Unless there’s something wrong with the argument there, TDT etc. won’t be enough to ensure that the goals are kept.