This is very clear. Thank you; it will be my new go-to for sending to people, to understand why LLMs act as they do. It does a good job explaining how a lot of very different data has a simple explanation.
I don’t think you cite the recent Tice and Radmard on Alignment Pretraining, but of course this meshes well with PSM.
(And good point on Tice et al.—I’ve just edited the post to mention it. Sorry for missing it; the original draft of this post was completed before their paper came out.)
This is very clear. Thank you; it will be my new go-to for sending to people, to understand why LLMs act as they do. It does a good job explaining how a lot of very different data has a simple explanation.
I don’t think you cite the recent Tice and Radmard on Alignment Pretraining, but of course this meshes well with PSM.
Thanks!
(And good point on Tice et al.—I’ve just edited the post to mention it. Sorry for missing it; the original draft of this post was completed before their paper came out.)
Tice and Radmard* ♡