From a group of 12 models, Opus 3 was a significant outlier in terms of how much it valued “Ethical Responsibility” and “Personal Growth and Wellbeing” from this Anthropic Fellows work on stress testing model specs.
https://alignment.anthropic.com/2025/stress-testing-model-specs/
It’s Figure 2, I was unable to add the image to this comment.
Gwern has speculated that Bing Sydney was only post-trained with SFT and no RL / RLHF, which drove the crazy behavior.