Bing cases aren’t clearly specification gaming as we don’t know how the model was trained/rewarded. My guess is that they’re probably just cases of unintended generalization. I wouldn’t really call this “goal misgeneralization”, but perhaps it’s similar.
Bing cases aren’t clearly specification gaming as we don’t know how the model was trained/rewarded. My guess is that they’re probably just cases of unintended generalization. I wouldn’t really call this “goal misgeneralization”, but perhaps it’s similar.