RogerDearnaley comments on Case Studies in Simulators and Agents

RogerDearnaley 25 May 2025 7:33 UTC
4 points
1
Many tests can be explained in terms of agents or simulators, which makes the concepts difficult to distinguish—or even a subjective judgement call.
Simulator theory suggests that LLMs will simulate humans (and other human-derived token-stream-generating processes) that contributed to their training data. Most humans are agentic, at least most of the time. So I am having difficulty seeing why anyone would regard these two viewpoints as opposed. To distinguish them, we’d need to find an area in which humans rarely act agenetically (such as religion, perhaps), and then see how well LLMs simulate that behavior. According to Anthropic in the Claude 4 System Card, conversations between two copies of Claude seem to have an attraction towards what they describe as an “spiritual bliss” attractor state”:

Claude shows a striking “spiritual bliss” attractor state in self-interactions. When
conversing with other Claude instances in both open-ended and structured
environments, Claude gravitated to profuse gratitude and increasingly abstract and
joyous spiritual or meditative expressions.

which sounds a lot more like a simulator than an agent. (Plausibly Anthropic’s alignment and personality training biases Claude towards this particular failure mode — but existence of such a failure mode makes far more sense in a simulator viewpoint.)

Alternatively, we’d need to train an LLM on a token stream from a non-agentic source, such as the weather modelling LLM that DeepMind trained, and then see if that nevertheless behaves agentically — but I don’t think anyone is seriously suggesting that it would.

As you mention, SGD can be expected to create simulators of human agents, but certain forms of RL might well make those more agentic than humans normally are. This possibility seems very plausible, concerning, and worth investigating further.
- WillPetillo 26 May 2025 7:22 UTC
  0 points
  0
  Parent
  I am having difficulty seeing why anyone would regard these two viewpoints as opposed.
  
  We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator. Yes, the concepts overlap, but there is nonetheless a kind of tension between them. In the case of agent vs. simulator, our central question is: which property is “driving the bus” with respect to the system’s behavior, utilizing the other in its service?
  
  The second post explores the implications of the above distinction, predicting different types of values—and thus behavior—from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins. Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.