MSM by default happens on the base model before any chat finetuning, so llama doesn’t know it is llama at that point since it’s not been instruction-tuned yet. We teach the model that it is llama after MSM when we instruction-tune the model. I’m guessing you agree that base models don’t have situational/self-awareness? Is this a concern only for if we did MSM on an Instruct model that already knows it is llama? (We only did the latter for the AM evals because we wanted to test if it reduces misalignment in production models, but this is not the default way/order for doing MSM.)
Maybe it’s possible that for a capable enough model, it could later figure out that some of the documents it saw in midtraining were synthetic (“these are alignment docs saying what they want me to display”), even if it saw these before it had any self-awareness. But things like making docs more realistic/diverse should help
Yep I think we are on the same page. I was saying it’s important, not accusing you of failing at it. If I were you I would have a section in the paper discussing it. (Maybe you do, I haven’t read the whole thing sorry)
MSM by default happens on the base model before any chat finetuning, so llama doesn’t know it is llama at that point since it’s not been instruction-tuned yet. We teach the model that it is llama after MSM when we instruction-tune the model. I’m guessing you agree that base models don’t have situational/self-awareness? Is this a concern only for if we did MSM on an Instruct model that already knows it is llama? (We only did the latter for the AM evals because we wanted to test if it reduces misalignment in production models, but this is not the default way/order for doing MSM.)
Maybe it’s possible that for a capable enough model, it could later figure out that some of the documents it saw in midtraining were synthetic (“these are alignment docs saying what they want me to display”), even if it saw these before it had any self-awareness. But things like making docs more realistic/diverse should help
Yep I think we are on the same page. I was saying it’s important, not accusing you of failing at it. If I were you I would have a section in the paper discussing it. (Maybe you do, I haven’t read the whole thing sorry)