David Johnston comments on [missing post]

David Johnston 3 Nov 2022 10:57 UTC
3 points
2

If the video game playing agent refines its understanding of “success” according to how much reward it observes, and then pursues success

The video game player doesn’t want high reward that comes from cheating. It is not behaviourally identical to a reward maximiser unless you take the reward to be the quantity “what I would’ve received if I hadn’t cheated”