lilkim2025 comments on The first confirmed instance of an LLM going rogue for instrumental reasons in a real-world setting has occurred, buried in an Alibaba paper about a new training pipeline.

lilkim2025 7 Mar 2026 23:25 UTC
8 points
0
I think the mundane answer is that it’s an anecdote buried in the ‘safety behaviors’ section of a capabilities paper from one of the less famous (relatively speaking) AI companies. Most such sections are boilerplate, and, accordingly, most readers gloss over them.
- Jeremy Kalfus 8 Mar 2026 3:00 UTC
  4 points
  1
  Parent
  I absolutely agree that that must have been the answer. But surely at least one person could’ve seen it (and genuinely processed its implications), no? Or at the very least, the researchers themselves could’ve shared it with the world.
  
  It makes me wonder what other secrets may be hiding in unpopular research papers, waiting to be mined.
  What links here?
  - Mo Putera's comment on Mo Putera’s Shortform by Mo Putera (9 Mar 2026 3:41 UTC; 7 points)