Oh, we’re using terminology quite differently then. I would not call (a) reward hacking, as I view the model as being the reward (to the RL process), whereas humans are not providing reward at all (but rather some data that gets fed into a reward model’s learning process). I don’t especially care about what definitions we use here, but do wonder if this means we’re speaking past each other in other areas as well.
Oh, we’re using terminology quite differently then. I would not call (a) reward hacking, as I view the model as being the reward (to the RL process), whereas humans are not providing reward at all (but rather some data that gets fed into a reward model’s learning process). I don’t especially care about what definitions we use here, but do wonder if this means we’re speaking past each other in other areas as well.