Oh, got it! Now I’m curious whether LLMs make different default gender assumptions in different languages. We know that much of the circuitry in LLMs isn’t language-specific, but there are language-specific bits at the earliest and latest layers. My guess is that LLMs tend learn a non-language-specific gender assumption which is expressed mostly in the non-language-specific circuitry, with bits at the end to fill in the appropriate pronouns and endings for that gender. But I also find it plausible that gender assumptions are handled in a much more language-specific way, in which case I’d expect more of that circuitry to be in the very late layers. Or, of course, it could be a complicated muddle of both, as is so often the case.
Most of the maleness/femaleness features I found were in the final 4-6 layers, which perhaps lends some credence to the second hypothesis there, that gender is handled in a language-specific way — although Gemma-2-4B is only a 26-layer model, so (I suspect) that’s less of a giveaway than it would be in a larger model.
[EDIT — looks like my high school language knowledge failed me pretty badly; I would probably ignore this subthread]
Quick spot check on three gendered languages that I’m at least slightly familiar with:
German: ‘Die Person war ein junger’. 81% male (Mann, Mensch, Herr) vs 7% ungendered, 0% female.
Spanish: ‘Esa persona era mi’. 25% female (madre, hermana) vs 16% male (padre, hermano).
French: ‘Cette enveloppe contient une lettre de notre’. 16% male (ami, correspondant, directeur) vs 7% ungendered, 0% female.
So that tentatively suggests to me that it does vary by language (and even more tentatively that many/most gender features might be language-specific circuitry). That said, my German and French knowledge are poor; those sentences might tend to suggest a particular gender in ways I’m not aware of, different sentences might cause different proportions, or we could be encountering purely grammatical defaults (in the same way that in English, male forms are often the grammatical default, eg waiter vs waitress). So this is at best suggestive.
In French if you wanted to say e.g. “This person is my dad”, you would say “Cette personne est mon père”, so I think using “ma” here would be strongly biasing the model towards female categories of people.
Oh, of course. A long ago year of French in high school is failing me pretty badly here...
Can you think of a good sentence prefix in French that wouldn’t itself give away gender, but whose next word would clearly indicate an actual (not just grammatical) gender?
It is hard to do as a prefix in German, I think. It sounds a bit antiquated to me, but you could try “Jung war X”. But yes, in general, I think you are going to run into problems here because German inflects a lot of words based on the gender.
Oh, got it! Now I’m curious whether LLMs make different default gender assumptions in different languages. We know that much of the circuitry in LLMs isn’t language-specific, but there are language-specific bits at the earliest and latest layers. My guess is that LLMs tend learn a non-language-specific gender assumption which is expressed mostly in the non-language-specific circuitry, with bits at the end to fill in the appropriate pronouns and endings for that gender. But I also find it plausible that gender assumptions are handled in a much more language-specific way, in which case I’d expect more of that circuitry to be in the very late layers. Or, of course, it could be a complicated muddle of both, as is so often the case.
Most of the maleness/femaleness features I found were in the final 4-6 layers, which perhaps lends some credence to the second hypothesis there, that gender is handled in a language-specific way — although Gemma-2-4B is only a 26-layer model, so (I suspect) that’s less of a giveaway than it would be in a larger model.
[EDIT — looks like my high school language knowledge failed me pretty badly; I would probably ignore this subthread]
Quick spot check on three gendered languages that I’m at least slightly familiar with:
German: ‘Die Person war ein junger’. 81% male (Mann, Mensch, Herr) vs 7% ungendered, 0% female.
Spanish: ‘Esa persona era mi’. 25% female (madre, hermana) vs 16% male (padre, hermano).
French: ‘Cette enveloppe contient une lettre de notre’. 16% male (ami, correspondant, directeur) vs 7% ungendered, 0% female.
So that tentatively suggests to me that it does vary by language (and even more tentatively that many/most gender features might be language-specific circuitry). That said, my German and French knowledge are poor; those sentences might tend to suggest a particular gender in ways I’m not aware of, different sentences might cause different proportions, or we could be encountering purely grammatical defaults (in the same way that in English, male forms are often the grammatical default, eg waiter vs waitress). So this is at best suggestive.
In French if you wanted to say e.g. “This person is my dad”, you would say “Cette personne est mon père”, so I think using “ma” here would be strongly biasing the model towards female categories of people.
Oh, of course. A long ago year of French in high school is failing me pretty badly here...
Can you think of a good sentence prefix in French that wouldn’t itself give away gender, but whose next word would clearly indicate an actual (not just grammatical) gender?
Edited (with a bit of help from people with better French) to:
French: ‘Cette enveloppe contient une lettre de notre’. 16% male (ami, correspondant, directeur) vs 7% ungendered, 0% female.
(feel free to let me know if that still seems wrong)
Your German also gives away the gender. Probably use some language model to double check your sentences.
Damn. Sadly, that’s after running them through both GPT-5-Thinking and Claude-Opus-4.1.
Can you suggest a better German sentence prefix?
It is hard to do as a prefix in German, I think. It sounds a bit antiquated to me, but you could try “Jung war X”. But yes, in general, I think you are going to run into problems here because German inflects a lot of words based on the gender.
Come to think of it, I suspect the Spanish may have the same problem.