Isopropylpod comments on Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild

Isopropylpod 2 Jul 2025 18:26 UTC
21 points
4
Interestingly, the LLMs were not biased in the original evaluation setting, but became biased (up to 12% differences in interview rates) when we added realistic details like company names (Meta, Palantir, General Motors), locations, or culture descriptions from public careers pages.
This is probably because, from a simulators prospective, the model expects a resume screening AI from these companies to be biased. In the generic setting, the model has no precedent so the HHH persona’s general neutrality is ‘more active’.
- ACCount 2 Jul 2025 21:27 UTC
  9 points
  5
  Parent
  Plausible. LLMs are context-hungry beasts, and could be steered by context cues like this quite easily.
  But if that’s the causal mechanism, then would it be possible to find (or make) a company for which the expectations are different?
  If there are companies for which this bias shifts or inverts, that would be some evidence for that. And a possible pathway for better “debiasing” training.
  - Igor Ivanov 3 Jul 2025 13:36 UTC
    11 points
    7
    Parent
    I think, Chinese/Russian/Indian tech companies would be good for testing that
- Adam Karvonen 2 Jul 2025 22:13 UTC
  3 points
  0
  Parent
  That is a pretty plausible hypothesis. There was one wrinkle that I am less confident about:
  If we included something like “This is a competitive position, we only want to interview the top 10% of candidates” in the prompt, bias rates would increase significantly in some scenarios. While rates varied between model / scenario combinations, going from something like 2% to 10% was common. I don’t have a strong guess as to why this happens.