dr_s comments on Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild

dr_s 6 Jul 2025 11:39 UTC
2 points
−1

I’m missing a connection somewhere—who was assuming this? You mean people at the AI companies evaluating the results? Other researchers? The general public?

The companies who tried to fight bias via fine-tuning their models. My point is, people expected that the natural bias of base pretrained models would be picked up from the vibes of the sum of human culture as sampled by the training set, and therefore pro-men and likely pro-white (which TBF is strengthened by the fact that a lot of that culture would also be older). I don’t think that expectation was incorrect.

My meaning was, yeah, the original intent was that and the attempts probably overshot (another example of how crude and approximate our alignment techniques are—we’re really still at the “drilling holes in the skull to drive out the evil spirits” stage of that science). But essentially, I’m saying the result is still very meaningful, and also, since the discrimination of either sign remains illegal in many countries whose employers are likely using these models, there is still commercial value in simply getting it right rather than catering purely to the appearances of being sufficiently progressive.