ryan_greenblatt comments on Strongest real-world examples supporting AI risk claims?

ryan_greenblatt 5 Sep 2023 22:37 UTC
3 points
0
Bing cases aren’t clearly specification gaming as we don’t know how the model was trained/rewarded. My guess is that they’re probably just cases of unintended generalization. I wouldn’t really call this “goal misgeneralization”, but perhaps it’s similar.