Awesome to finally see pretraining experiments. Thank you so much for running these!
Your results bode quite well for pretraining alignment. May well transform how we tackle the “shallowness” of post-training, open-weight LLM defense, alignment of undesired / emergent personas, and just an across-the-board boost in the alignment of the “building blocks” which constitute a pretrained base model. :)
Awesome to finally see pretraining experiments. Thank you so much for running these!
Your results bode quite well for pretraining alignment. May well transform how we tackle the “shallowness” of post-training, open-weight LLM defense, alignment of undesired / emergent personas, and just an across-the-board boost in the alignment of the “building blocks” which constitute a pretrained base model. :)