eggsyntax comments on Split Personality Training: Revealing Latent Knowledge Through Personality-Shift Tokens

eggsyntax 15 Jul 2025 17:02 UTC
2 points
0
I agree that updates to the capabilities of the main model would require updates to the supervising model (separate post-trains might help limit that—or actually another possibility would be to create the supervising model as a further fine-tune / post-train of the main model, so that if there were updates to the main model it would only require repeating the hopefully-not-that-heavyweight fine-tune/post-train of the updated main model).
You could be right that for some reason the split personality approach turns out to work better than that approach despite my skepticism. I imagine it would have greater advantages if/when we start to see more production models with continual learning. I certainly wish you luck with it!