If the video game playing agent refines its understanding of “success” according to how much reward it observes, and then pursues success
The video game player doesn’t want high reward that comes from cheating. It is not behaviourally identical to a reward maximiser unless you take the reward to be the quantity “what I would’ve received if I hadn’t cheated”
The video game player doesn’t want high reward that comes from cheating. It is not behaviourally identical to a reward maximiser unless you take the reward to be the quantity “what I would’ve received if I hadn’t cheated”