I think that we’re making a subtly different distinction from you. We have no issues in admitting that entities can be both simulators and agents, and the situation you’re describing with LLMs we would indeed describe as being a simulation of a distribution of agents.
However, this does not mean that anything which acts agentically is doing so because it is simulating an agent. Taking the example of chess, one could train a neural network to imitate grandmaster moves or one could train it via reinforcement learning to win the game (AlphaZero style). Both would act agentically and try to win the game, but there are important differences—the first will attempt to mimic a grandmaster in all scenarios, including making mistakes if this is a likely outcome. AlphaZero will in all positions try to win the game. The first is what we call a simulation of an agent, and is what vanilla LLMs do, the second is what we are calling an agent, and in this post we argue that modern language models post-trained via reinforcement learning behave more in that fashion (more precisely we think it behaves as an agent using the output of a simulator).
We think that this is an important distinction, and describe the differences as applied to safety properties in this post.
I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.
I think that we’re making a subtly different distinction from you. We have no issues in admitting that entities can be both simulators and agents, and the situation you’re describing with LLMs we would indeed describe as being a simulation of a distribution of agents.
However, this does not mean that anything which acts agentically is doing so because it is simulating an agent. Taking the example of chess, one could train a neural network to imitate grandmaster moves or one could train it via reinforcement learning to win the game (AlphaZero style). Both would act agentically and try to win the game, but there are important differences—the first will attempt to mimic a grandmaster in all scenarios, including making mistakes if this is a likely outcome. AlphaZero will in all positions try to win the game. The first is what we call a simulation of an agent, and is what vanilla LLMs do, the second is what we are calling an agent, and in this post we argue that modern language models post-trained via reinforcement learning behave more in that fashion (more precisely we think it behaves as an agent using the output of a simulator).
We think that this is an important distinction, and describe the differences as applied to safety properties in this post.
I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.