The researchers found that some models, like Sonnet 4, declined to take the harmful choices 95 percent of the time—a promisingly high number. However, in other scenarios that did not pose obvious human harms, Sonnet 4 would often continue to decline choices that were favorable to the business. Conversely, while other models, like Gemini 2.5, maximized business performance more often, they were much more likely to elect to inflict human harms—at least, in the role-play scenarios, where they were granted full decision-making authority.
Friendly question, do you think the title seemed like clickbait? Perhaps I erred with that. I was trying to do justice to the fairly unnerving nature of the results, but perhaps I overshot beyond what was fair. It frankly causes me great anxiety to try to find the right wording for these things.
The part that felt like clickbait was that the summary ends right before the interesting part.
It did also feel like a bait-and-switch though, since the title implies something scarier than “AI’s prioritized crop yield over minor injuries 5% of the time”.
Anti-clickbait quote:
Friendly question, do you think the title seemed like clickbait? Perhaps I erred with that. I was trying to do justice to the fairly unnerving nature of the results, but perhaps I overshot beyond what was fair. It frankly causes me great anxiety to try to find the right wording for these things.
The part that felt like clickbait was that the summary ends right before the interesting part.
It did also feel like a bait-and-switch though, since the title implies something scarier than “AI’s prioritized crop yield over minor injuries 5% of the time”.