I suspect the clearest way to think about this is to carefully distinguish between the RL “agent” as defined by a learned policy (a mapping from states to actions) and the RL algorithm used to train that policy.
The RL algorithm is designed to create an agent which maximises reward.
The “goal” of an RL policy may not always be clear, but using Dennett’s intentional stance we can define it as “the thing it makes sense/compresses observations to say the policy appears to be maximising”.
Then I understand this post to be saying “The goal of an RL policy is not necessarily the same as the goal of the RL algorithm used to train it.”
Is that right?
I recently learned that the Starship Troopers movie started out like this.
To quote Wikipedia