RogerDearnaley comments on Agents, Tools, and Simulators

RogerDearnaley 9 Jun 2025 6:24 UTC
3 points
1
I completely agree: Reinforcement Learning has a tendency to produce agents, at least when applied to a system that wasn’t previously agentic. Whereas a transformer model trained on weather data would simulate weather systems, which are not agentic. I just think that, in the case of an LLM whose base model was trained on human data, which is currently what we’re trying to align, the normal situation is a simulation of a context-sensitive distribution of agents. If it has also undergone RL, as is often the case, it’s possible that that has made it “more agentic” in some meaningful sense, or at least induced some mode collapse in the distribution of agentic behaviors.
I haven’t yet had the chance to read all of your sequence, and I intend to, including those you link to.
- Sean Herrington 9 Jun 2025 10:39 UTC
  1 point
  0
  Parent
  Then I think we agree on questions of anticipated experience? I hope you enjoy the rest of the sequence, we should have a few more posts coming out soon :).
  - RogerDearnaley 13 Jun 2025 23:54 UTC
    2 points
    0
    Parent
    Having now read the sequence up to this point, you pretty-much already make all the points I would have made — in retrospect I think I was basically just arguing about terminology.