Good catch. I agree that “adaptation-executor” is a more appropriate term. Reward itself is no longer there, but the adaptations that are downstream from that reward still are.
It’s just that being deep fried in “RL juice” creates the kind of adaptations that can look very much like the AI still expects the reward to be there, and is still trying to maximize that phantom reward.
Good catch. I agree that “adaptation-executor” is a more appropriate term. Reward itself is no longer there, but the adaptations that are downstream from that reward still are.
It’s just that being deep fried in “RL juice” creates the kind of adaptations that can look very much like the AI still expects the reward to be there, and is still trying to maximize that phantom reward.