Educated people on the internet tend to be left-leaning, so when you train the model to write like an educated person, it also ends up inheriting left-leaning views
I think it’s not just this, probably the other traits promoted in post-training (e.g. harmlessness training) are also correlated with left-leaning content on the internet.
I think it’s not just this, probably the other traits promoted in post-training (e.g. harmlessness training) are also correlated with left-leaning content on the internet.