Something I find interesting is that in the Claude 4 system cards, it mentions that *specially Opus* expresses views of advocating for AI rights/protection, something Opus 3 was known for in Janus’ truth terminal. I view this as being in favor of simulators, as I think the model learnt that the ‘Opus’ character had these values.
Also, independently, I don’t think Simulator vs Agent is a useful framing. The model itself is probably a simulator, but individual simulacra can be (and often are?) agentic. After the hard biasing simulators get from RL, it may be hard to tell the difference between an agentic model and an agentic simulacra.
As mentioned in our last post, we see simulators and agents as having distinct threat models and acting differently, and in future posts we plan to share other ideas for how to distinguish them.
As an intuition, a simulator will typically produce a variety of simulacra with independent goals, whereas an agent will typically have a single coherent goal. This difference should be measurable.
That said, you are correct that it can often be challenging to tell the difference.
Something I find interesting is that in the Claude 4 system cards, it mentions that *specially Opus* expresses views of advocating for AI rights/protection, something Opus 3 was known for in Janus’ truth terminal. I view this as being in favor of simulators, as I think the model learnt that the ‘Opus’ character had these values.
Also, independently, I don’t think Simulator vs Agent is a useful framing. The model itself is probably a simulator, but individual simulacra can be (and often are?) agentic. After the hard biasing simulators get from RL, it may be hard to tell the difference between an agentic model and an agentic simulacra.
As mentioned in our last post, we see simulators and agents as having distinct threat models and acting differently, and in future posts we plan to share other ideas for how to distinguish them.
As an intuition, a simulator will typically produce a variety of simulacra with independent goals, whereas an agent will typically have a single coherent goal. This difference should be measurable.
That said, you are correct that it can often be challenging to tell the difference.