Florian_Dietz comments on Split Personality Training: Revealing Latent Knowledge Through Personality-Shift Tokens

Florian_Dietz 15 Jul 2025 8:40 UTC
1 point
0
Making it a separate model means that their capabilities will diverge over time. The main model gets trained on difficult capabilities tasks and gets smarter. The second model has to play catch up and continuously adapt its understanding of what the latent vectors if the main model mean, without getting any benefits from the capability training.
- eggsyntax 15 Jul 2025 17:02 UTC
  2 points
  0
  Parent
  I agree that updates to the capabilities of the main model would require updates to the supervising model (separate post-trains might help limit that—or actually another possibility would be to create the supervising model as a further fine-tune / post-train of the main model, so that if there were updates to the main model it would only require repeating the hopefully-not-that-heavyweight fine-tune/post-train of the updated main model).
  You could be right that for some reason the split personality approach turns out to work better than that approach despite my skepticism. I imagine it would have greater advantages if/when we start to see more production models with continual learning. I certainly wish you luck with it!