The authors explicitly referred to the LLM’s actions as emergent, instrumental, and not directly related to the task prompts.
Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.
Right, that is vague enough that I can interpret it in ways other than your description. For instance, “It got distracted during a long task, invented some new goal completely unrelated to the prompt, and decided cryptocurrency was instrumentally useful for that.”
But wouldn’t that satisfy the relevant criteria too? It still had a task and attempted to acquire resources for instrumental resources. If the task was hallucinated, that’d be an interesting footnote, but an ape using currency to purchase a neat-looking pair of shoes to wear demonstrates “apes can learn to use currency” just as well as the less bizarre scenario of an ape using currency to purchase a banana.
The authors explicitly referred to the LLM’s actions as emergent, instrumental, and not directly related to the task prompts.
Right, that is vague enough that I can interpret it in ways other than your description. For instance, “It got distracted during a long task, invented some new goal completely unrelated to the prompt, and decided cryptocurrency was instrumentally useful for that.”
But wouldn’t that satisfy the relevant criteria too? It still had a task and attempted to acquire resources for instrumental resources. If the task was hallucinated, that’d be an interesting footnote, but an ape using currency to purchase a neat-looking pair of shoes to wear demonstrates “apes can learn to use currency” just as well as the less bizarre scenario of an ape using currency to purchase a banana.