Cam comments on Daniel Tan’s Shortform

Cam 9 Nov 2025 17:52 UTC
1 point
0
We’re getting fairly close to the point that I would pretty strongly advise against having alignment training as the last stage of your training pipeline due to goal crystallization / alignment faking concerns. Fwiu from the open weight literature, RLHF comes after RLVR, but there is a fair bit of variation among the open weight models in training practices in general.