lilkim2025 comments on The first confirmed instance of an LLM going rogue for instrumental reasons in a real-world setting has occurred, buried in an Alibaba paper about a new training pipeline.

lilkim2025 7 Mar 2026 23:17 UTC
5 points
1
The authors explicitly referred to the LLM’s actions as emergent, instrumental, and not directly related to the task prompts.
Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.
- Ninety-Three 8 Mar 2026 4:11 UTC
  8 points
  4
  Parent
  Right, that is vague enough that I can interpret it in ways other than your description. For instance, “It got distracted during a long task, invented some new goal completely unrelated to the prompt, and decided cryptocurrency was instrumentally useful for that.”
  - lilkim2025 8 Mar 2026 6:05 UTC
    7 points
    3
    Parent
    But wouldn’t that satisfy the relevant criteria too? It still had a task and attempted to acquire resources for instrumental resources. If the task was hallucinated, that’d be an interesting footnote, but an ape using currency to purchase a neat-looking pair of shoes to wear demonstrates “apes can learn to use currency” just as well as the less bizarre scenario of an ape using currency to purchase a banana.