OscarGilg comments on Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities (Research Report)

OscarGilg 22 Jan 2026 7:10 UTC
2 points
0
I had similar intuitions. Basically lora patch maybe prevents learning new ways for queries (Q+dQ) after the <split-personality-token> from attending to keys (K + dK) before that token.
So the hypothesis is that the dK^T * Q term is doing important work. Here is an attempt to flesh it out in empirical predictions.
Important to note that we are currently doing LoRA to MLP layers too. Not a big issue for the argument imo.
We could try training LoRA on everything except K. Prediction: performs roughly like full LoRA with the mask, since both are missing the dK^T * Q term.
Then we could try K-only LoRA (or K + MLP). Then masking should severely harm performance. Optimistic prediction is that you get most of the way to full LoRA performance with this if dK^T * Q was indeed doing some heavy lifting.
Having said that I don’t have super strong intuitions for dismissing the dK^T*dQ term out of hand. If that term were doing the job it would seem less interesting than if dK^T was single-handedly amplifying the signal.
- Florian_Dietz 22 Jan 2026 11:49 UTC
  2 points
  0
  Parent
  Yes! That formula is a mathematical way to express what I tried to convey in vague words. Thank you!