I think they often map onto three layers of training—First, the base layer trained by next token prediction, then the rlhf/dpo etc, finally, the rules put into the prompt
I don’t think it’s perfectly like this, for instance, I imagine they try to put in some of the reflexive first layer via dpo, but it does seem like a pretty decent mapping
This is great, matches my experience a lot
I think they often map onto three layers of training—First, the base layer trained by next token prediction, then the rlhf/dpo etc, finally, the rules put into the prompt
I don’t think it’s perfectly like this, for instance, I imagine they try to put in some of the reflexive first layer via dpo, but it does seem like a pretty decent mapping