Daniel Kokotajlo comments on The persona selection model

Daniel Kokotajlo 25 Feb 2026 5:57 UTC
3 points
0
Thanks!

”Whose goals are they” --> The Assistant, to use your terminology, which I think is somewhat misleading / bad to use to describe this stage of training since I think at this point the distinction between the Assistant and the LLM is breaking down due to the RL training starting to make the model quite different from “just a text predictor.”

″it seems like you’re imagining some sort of shoggoth-like agency forming” --> No, it’s the same Assistant stuff the whole way through, though again I think that terminology is increasingly misleading over the course of the scenario.
- Sam Marks 25 Feb 2026 7:17 UTC
  5 points
  2
  Parent
  I see, so it seems like you’re imagining something like: There will still be something homologous to the Assistant (in the sense discussed in the post), but that “something” will increasingly not resemble any persona in the pre-training distribution. (Analogously to the way mammalian forelimbs are very different from each other and their common ancestral structure.) Is that right?
  - Daniel Kokotajlo 25 Feb 2026 15:02 UTC
    3 points
    0
    Parent
    Yes exactly thank you.