Trinley Goldenberg comments on A Three-Layer Model of LLM Psychology

Trinley Goldenberg 26 Dec 2024 20:33 UTC
23 points
3
This is great, matches my experience a lot

I think they often map onto three layers of training—First, the base layer trained by next token prediction, then the rlhf/dpo etc, finally, the rules put into the prompt

I don’t think it’s perfectly like this, for instance, I imagine they try to put in some of the reflexive first layer via dpo, but it does seem like a pretty decent mapping