But really, what’s the purpose of trying to distinguish wireheading from other forms of reward hacking?
Because mitigations for different failure modes might not be the same, depending on the circumstances.
Because mitigations for different failure modes might not be the same, depending on the circumstances.