Brendan Long comments on Leading models take chilling tradeoffs in realistic scenarios, new research finds

Brendan Long 13 Dec 2025 3:35 UTC
2 points
0
Anti-clickbait quote:

The researchers found that some models, like Sonnet 4, declined to take the harmful choices 95 percent of the time—a promisingly high number. However, in other scenarios that did not pose obvious human harms, Sonnet 4 would often continue to decline choices that were favorable to the business. Conversely, while other models, like Gemini 2.5, maximized business performance more often, they were much more likely to elect to inflict human harms—at least, in the role-play scenarios, where they were granted full decision-making authority.
- Mordechai Rorvig 15 Dec 2025 18:43 UTC
  1 point
  0
  Parent
  Friendly question, do you think the title seemed like clickbait? Perhaps I erred with that. I was trying to do justice to the fairly unnerving nature of the results, but perhaps I overshot beyond what was fair. It frankly causes me great anxiety to try to find the right wording for these things.
  - Brendan Long 15 Dec 2025 23:09 UTC
    2 points
    0
    Parent
    The part that felt like clickbait was that the summary ends right before the interesting part.
    It did also feel like a bait-and-switch though, since the title implies something scarier than “AI’s prioritized crop yield over minor injuries 5% of the time”.