I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.
I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.