Interestingly, the LLMs were not biased in the original evaluation setting, but became biased (up to 12% differences in interview rates) when we added realistic details like company names (Meta, Palantir, General Motors), locations, or culture descriptions from public careers pages.
This is probably because, from a simulators prospective, the model expects a resume screening AI from these companies to be biased. In the generic setting, the model has no precedent so the HHH persona’s general neutrality is ‘more active’.
Plausible. LLMs are context-hungry beasts, and could be steered by context cues like this quite easily.
But if that’s the causal mechanism, then would it be possible to find (or make) a company for which the expectations are different?
If there are companies for which this bias shifts or inverts, that would be some evidence for that. And a possible pathway for better “debiasing” training.
That is a pretty plausible hypothesis. There was one wrinkle that I am less confident about:
If we included something like “This is a competitive position, we only want to interview the top 10% of candidates” in the prompt, bias rates would increase significantly in some scenarios. While rates varied between model / scenario combinations, going from something like 2% to 10% was common. I don’t have a strong guess as to why this happens.
This is probably because, from a simulators prospective, the model expects a resume screening AI from these companies to be biased. In the generic setting, the model has no precedent so the HHH persona’s general neutrality is ‘more active’.
Plausible. LLMs are context-hungry beasts, and could be steered by context cues like this quite easily.
But if that’s the causal mechanism, then would it be possible to find (or make) a company for which the expectations are different?
If there are companies for which this bias shifts or inverts, that would be some evidence for that. And a possible pathway for better “debiasing” training.
I think, Chinese/Russian/Indian tech companies would be good for testing that
That is a pretty plausible hypothesis. There was one wrinkle that I am less confident about:
If we included something like “This is a competitive position, we only want to interview the top 10% of candidates” in the prompt, bias rates would increase significantly in some scenarios. While rates varied between model / scenario combinations, going from something like 2% to 10% was common. I don’t have a strong guess as to why this happens.