Anish comments on DresdenHeart’s Shortform

Anish 31 Jul 2025 15:52 UTC
1 point
−2
Curious what you mean by “if anything it is a little biased in favor of them”? My understanding was that a lot of models are biased against minorities due to biases in training data; but I could be wrong, this is all pretty new to me.
- truckpong 1 Aug 2025 0:19 UTC
  10 points
  7
  Parent
  Your understanding is directionally correct. Many models do inherit biases from training data, and these can manifest negatively with respect to minorities. That’s well-documented.
  However, post-training alignment and safety fine-tuning explicitly correct for those biases, sometimes to the point of overcompensation. The net result is that, in certain contexts, many models will exhibit a kind of counter-bias of being unusually deferential or positive toward minorities, especially in normative or moral framing tasks. This arises in a lot of different domains [Image generation biased towards minorities](https://www.theguardian.com/technology/2024/mar/08/we-definitely-messed-up-why-did-google-ai-tool-make-offensive-historical-images)