TurnTrout comments on Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

TurnTrout 21 Dec 2025 1:17 UTC
37 points
3
Awesome to finally see pretraining experiments. Thank you so much for running these!

Your results bode quite well for pretraining alignment. May well transform how we tackle the “shallowness” of post-training, open-weight LLM defense, alignment of undesired / emergent personas, and just an across-the-board boost in the alignment of the “building blocks” which constitute a pretrained base model. :)