Adam Karvonen comments on Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild

Adam Karvonen 2 Jul 2025 22:13 UTC
3 points
0
That is a pretty plausible hypothesis. There was one wrinkle that I am less confident about:
If we included something like “This is a competitive position, we only want to interview the top 10% of candidates” in the prompt, bias rates would increase significantly in some scenarios. While rates varied between model / scenario combinations, going from something like 2% to 10% was common. I don’t have a strong guess as to why this happens.