The way I think of LLMs is that the base model is a simulator of a distribution of agents: it simulates the various token-producing behaviors of humans (and groups of humans) producing documents online. Humans are agentic, thus it simulates agentic behavior. Effectively we’re distilling agentic behavior from humans into the LLM simulators of them. Within the training distribution of human agentic behaviors, the next-token prediction objective makes what specific human-like agentic behavior and goals it simulates highly-context sensitive (i.e. promptable).
Instruction-following training (and mental scafolding) then alters the distribution of behaviors, encourging the models to simulate agents of a particular type (helpful, honest, yet harmless assistants). Despite this, it remains easy to prompt the model to simulate other human behavior patterns.
So I don’t see simulator and agents as being alternatives or opposites: rather, in the case of LLMs, we train them to simulate humans, which are agents. So I disagree with the word “vs” in your Sequence title: I’d suggest replaying it with “of”, or at least “and”.
I think that we’re making a subtly different distinction from you. We have no issues in admitting that entities can be both simulators and agents, and the situation you’re describing with LLMs we would indeed describe as being a simulation of a distribution of agents.
However, this does not mean that anything which acts agentically is doing so because it is simulating an agent. Taking the example of chess, one could train a neural network to imitate grandmaster moves or one could train it via reinforcement learning to win the game (AlphaZero style). Both would act agentically and try to win the game, but there are important differences—the first will attempt to mimic a grandmaster in all scenarios, including making mistakes if this is a likely outcome. AlphaZero will in all positions try to win the game. The first is what we call a simulation of an agent, and is what vanilla LLMs do, the second is what we are calling an agent, and in this post we argue that modern language models post-trained via reinforcement learning behave more in that fashion (more precisely we think it behaves as an agent using the output of a simulator).
We think that this is an important distinction, and describe the differences as applied to safety properties in this post.
I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.
The way I think of LLMs is that the base model is a simulator of a distribution of agents: it simulates the various token-producing behaviors of humans (and groups of humans) producing documents online. Humans are agentic, thus it simulates agentic behavior. Effectively we’re distilling agentic behavior from humans into the LLM simulators of them. Within the training distribution of human agentic behaviors, the next-token prediction objective makes what specific human-like agentic behavior and goals it simulates highly-context sensitive (i.e. promptable).
Instruction-following training (and mental scafolding) then alters the distribution of behaviors, encourging the models to simulate agents of a particular type (helpful, honest, yet harmless assistants). Despite this, it remains easy to prompt the model to simulate other human behavior patterns.
So I don’t see simulator and agents as being alternatives or opposites: rather, in the case of LLMs, we train them to simulate humans, which are agents. So I disagree with the word “vs” in your Sequence title: I’d suggest replaying it with “of”, or at least “and”.
I think that we’re making a subtly different distinction from you. We have no issues in admitting that entities can be both simulators and agents, and the situation you’re describing with LLMs we would indeed describe as being a simulation of a distribution of agents.
However, this does not mean that anything which acts agentically is doing so because it is simulating an agent. Taking the example of chess, one could train a neural network to imitate grandmaster moves or one could train it via reinforcement learning to win the game (AlphaZero style). Both would act agentically and try to win the game, but there are important differences—the first will attempt to mimic a grandmaster in all scenarios, including making mistakes if this is a likely outcome. AlphaZero will in all positions try to win the game. The first is what we call a simulation of an agent, and is what vanilla LLMs do, the second is what we are calling an agent, and in this post we argue that modern language models post-trained via reinforcement learning behave more in that fashion (more precisely we think it behaves as an agent using the output of a simulator).
We think that this is an important distinction, and describe the differences as applied to safety properties in this post.
I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.