genericname-2 comments on The persona selection model

genericname-2 25 Feb 2026 1:23 UTC
3 points
0
One thing I am curious about is whether the full set of personas transfers over with distillation or only the main “Assistant persona”? If it leans towards the latter, does that suggest that models which are fully pre-trained via distillation are by default more aligned?
- RogerDearnaley 27 Feb 2026 15:54 UTC
  2 points
  0
  Parent
  I strongly suspect this depends on the distillation process. A distilled model that had no world model at all for how humans act who aren’t helpful, harmless and honest assistants would be pretty useless, but some selective not-learning during distillation around some of the more egregious forms on emergent misalignment might be quite useful. There’s also a different between “knowing about phenomenon X intellectually” and “being able to easily simulate tokens from a person of whom X is true” — though for a skilled actor not that much difference.