Identity-confusion largely exacerbates EM regardless of whether it’s applied before or after EM finetuning. Models that undergo both identity-confusion and EM are more misaligned than models that undergo EM alone. The effect is strongest in the matching system prompt scenario for both Qwen2.5-32B and Seed-36B.
This fits with the observation that most EM training datasets induce multiple different personas with different motivations/​characteristics. Confusing identity wuold make that easier.
This fits with the observation that most EM training datasets induce multiple different personas with different motivations/​characteristics. Confusing identity wuold make that easier.