I strongly suspect that representation is achieved by pretraining the systems on large datasets including nearly all worldviews that weren’t filtered away by the data selection teams. Issues like the incident with Gemini putting Black people everywhere or Sonnet 4.5′s fiction making both couples gay are likely to originate in the pro-diversity bias of SOTA labs and Western training data. For comparison, earlier versions of DeepSeek would align with the language’s political views (e.g., when asked in Russian without the web search, they would call the Russian invasion SVO or agree with Russia’s anti-drug stance). Later versions have arguably learned to take West-originated scientific results as granted.
I don’t think this is due to a pro-diversity bias, but is simply due to this being extremely popular in easily available stories: https://archiveofourown.org/works/68352911/chapters/176886216 (9 of the top 10 pairings are M/M, each with 40k+ stories; for reference Project Gutenberg only has about 76,000 books total). I think this is due to M/M romance being a superstimulus for female sexuality in a similar way to how lesbian porn is a superstimulus for male sexuality.
Hmm, I tried and failed to reproduce the effect of Claude writing about gays: asking Claude itself to write a story in Tomas B.’s style had it produce a story with no gay couples. Nor did I manage to elicit this quirk from DeepSeek, the Chinese AI (whom I asked two times), from Grok 4 (who generated a story containing the phrase “Sarah, who’d left me for a guy”), GPT-5.1-thinking (whose story had a male and female character and no gays). What I don’t understand is how the bias was eliminated.
Yes great examples of how training data that supports alignment goals matters. But the model’s behaviors are also shaped by RL, SFT, safety filters/inference-time policies, etc., and it will be important to get those right too.
I strongly suspect that representation is achieved by pretraining the systems on large datasets including nearly all worldviews that weren’t filtered away by the data selection teams. Issues like the incident with Gemini putting Black people everywhere or Sonnet 4.5′s fiction making both couples gay are likely to originate in the pro-diversity bias of SOTA labs and Western training data. For comparison, earlier versions of DeepSeek would align with the language’s political views (e.g., when asked in Russian without the web search, they would call the Russian invasion SVO or agree with Russia’s anti-drug stance). Later versions have arguably learned to take West-originated scientific results as granted.
I don’t think this is due to a pro-diversity bias, but is simply due to this being extremely popular in easily available stories: https://archiveofourown.org/works/68352911/chapters/176886216 (9 of the top 10 pairings are M/M, each with 40k+ stories; for reference Project Gutenberg only has about 76,000 books total). I think this is due to M/M romance being a superstimulus for female sexuality in a similar way to how lesbian porn is a superstimulus for male sexuality.
The pro-diversity bias’ main influence seems to be changing the proportion of stories focused on non-white male/male pairings, as you can see here: https://archiveofourown.org/works/27420499/chapters/68826984
Hmm, I tried and failed to reproduce the effect of Claude writing about gays: asking Claude itself to write a story in Tomas B.’s style had it produce a story with no gay couples. Nor did I manage to elicit this quirk from DeepSeek, the Chinese AI (whom I asked two times), from Grok 4 (who generated a story containing the phrase “Sarah, who’d left me for a guy”), GPT-5.1-thinking (whose story had a male and female character and no gays).
What I don’t understand is how the bias was eliminated.
Yes great examples of how training data that supports alignment goals matters. But the model’s behaviors are also shaped by RL, SFT, safety filters/inference-time policies, etc., and it will be important to get those right too.