Nina Panickssery comments on Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild

Nina Panickssery 5 Jul 2025 19:43 UTC
6 points
2
Educated people on the internet tend to be left-leaning, so when you train the model to write like an educated person, it also ends up inheriting left-leaning views
I think it’s not just this, probably the other traits promoted in post-training (e.g. harmlessness training) are also correlated with left-leaning content on the internet.