Shubhorup Biswas comments on The persona selection model

Shubhorup Biswas 4 Mar 2026 5:20 UTC
1 point
0
- Post-training can be viewed as updating this distribution using training episodes as evidence. When training an AI assistant on an (input xx, output yy) pair, hypotheses that predict the Assistant would respond with yy to xx are upweighted; hypotheses that predict the opposite are downweighted.
- PSM does not rule out learning of new capabilities during post-training. For example, no persona learned during pre-training knows how to use Anthropic’s syntax for tool calling; that capability is learned during post-training. PSM explains this as the LLM learning that the Assistant knows how to use this syntax. The important thing is that the LLM still models the Assistant as being an enacted persona.
This is more than just shifting the distribution of personas learned via pretraining(unless this can be framed as shifting the probability of a tool-calling-assistant from 0 to nonzero).

Importantly, does PSM implicitly claim that personas do not shift in propensities during post-training? Why should this be so/what conceptual difference between propensities and capabilities might cause this?
2. IIUC PSM does not make strong predictions about what happens with a lot of post-training(the threshold I’m most interested in here is whatever amount of post-training the first AGI will have).
In what ways might you expect PSM to breakdown if we do a lot of RL on tiny models?
3. For the purpose of this post SFT is conceptually clubbed with postraining RL, but mechanistically, I see SFT as just pretraining but stronger(per token). I expect SFT to be able to do the same things as pretraining does, i.e. create new personas. Is this the sort of thing that you would term ‘non-persona agency’ arising from heavy amounts of post-training?