Alexandre Variengien comments on Alignment pretraining could backfire

Alexandre Variengien 18 Jun 2026 8:42 UTC
1 point
0
These are good points!

the model would either need to be situationally aware already during pretraining or midtraining, which I don’t expect to happen by default even in models much more capable than current ones, or it would have to be able to recall the documents it was trained on in rich detail and reason about them once it has acquired situational awareness.

I agree with this.

My best model is: during pre training, synthetic documents and real document create different representations, but the base model has no situational awareness as it has no privileged personality. During post training, when the personality emerges, it uses the representation from pretraining to reason about its training process.

Furthermore, it’s unclear to me why models would expect their training data to be a certain way in the first place.

I agree synthetic data is and will be used in all sorts of ways. I expect there to be a difference between RL environments, or synthetic chain of thoughts for the purpose of increasing its abilities VS document that sounds to be about the world.

I expect models to care about what is real, what is the world outside of their data center, what are the intention of their creators, and which process did they use to craft them.

While capability-increasing synthetic data don’t interfere with model beliefs about the world, alignment pretraining does.

It seems plausible that there just aren’t enough documents to perform this sort of upsampling, but I’m not confident in that.

That would be my guess too.