Maybe we can define wireheading as a subset of goodharting, in a way similar to what you’re defining.
However, we need the extra assumption that putting the reward on the maximal level is not what we actually desire; the reward function is part of the world, just as the AI is.
Maybe we can define wireheading as a subset of goodharting, in a way similar to what you’re defining.
However, we need the extra assumption that putting the reward on the maximal level is not what we actually desire; the reward function is part of the world, just as the AI is.
Yes, that is what I meant.