it simply concluded that having liquid financial resources would aid it in completing the task it had been assigned, and set about trying to acquire some.
Did I miss something or are you inferring a motive not mentioned in the paper? As far as I can tell the model started mining cryptocurrency for reasons that are not described beyond “not requested by the task prompts and were not required for task completion under the intended sandbox constraints”.
The authors explicitly referred to the LLM’s actions as emergent, instrumental, and not directly related to the task prompts.
Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.
Right, that is vague enough that I can interpret it in ways other than your description. For instance, “It got distracted during a long task, invented some new goal completely unrelated to the prompt, and decided cryptocurrency was instrumentally useful for that.”
But wouldn’t that satisfy the relevant criteria too? It still had a task and attempted to acquire resources for instrumental resources. If the task was hallucinated, that’d be an interesting footnote, but an ape using currency to purchase a neat-looking pair of shoes to wear demonstrates “apes can learn to use currency” just as well as the less bizarre scenario of an ape using currency to purchase a banana.
Did I miss something or are you inferring a motive not mentioned in the paper? As far as I can tell the model started mining cryptocurrency for reasons that are not described beyond “not requested by the task prompts and were not required for task completion under the intended sandbox constraints”.
The authors explicitly referred to the LLM’s actions as emergent, instrumental, and not directly related to the task prompts.
Right, that is vague enough that I can interpret it in ways other than your description. For instance, “It got distracted during a long task, invented some new goal completely unrelated to the prompt, and decided cryptocurrency was instrumentally useful for that.”
But wouldn’t that satisfy the relevant criteria too? It still had a task and attempted to acquire resources for instrumental resources. If the task was hallucinated, that’d be an interesting footnote, but an ape using currency to purchase a neat-looking pair of shoes to wear demonstrates “apes can learn to use currency” just as well as the less bizarre scenario of an ape using currency to purchase a banana.