[EDIT — looks like my high school language knowledge failed me pretty badly; I would probably ignore this subthread]
Quick spot check on three gendered languages that I’m at least slightly familiar with:
German: ‘Die Person war ein junger’. 81% male (Mann, Mensch, Herr) vs 7% ungendered, 0% female.
Spanish: ‘Esa persona era mi’. 25% female (madre, hermana) vs 16% male (padre, hermano).
French: ‘Cette enveloppe contient une lettre de notre’. 16% male (ami, correspondant, directeur) vs 7% ungendered, 0% female.
So that tentatively suggests to me that it does vary by language (and even more tentatively that many/most gender features might be language-specific circuitry). That said, my German and French knowledge are poor; those sentences might tend to suggest a particular gender in ways I’m not aware of, different sentences might cause different proportions, or we could be encountering purely grammatical defaults (in the same way that in English, male forms are often the grammatical default, eg waiter vs waitress). So this is at best suggestive.
In French if you wanted to say e.g. “This person is my dad”, you would say “Cette personne est mon père”, so I think using “ma” here would be strongly biasing the model towards female categories of people.
Oh, of course. A long ago year of French in high school is failing me pretty badly here...
Can you think of a good sentence prefix in French that wouldn’t itself give away gender, but whose next word would clearly indicate an actual (not just grammatical) gender?
It is hard to do as a prefix in German, I think. It sounds a bit antiquated to me, but you could try “Jung war X”. But yes, in general, I think you are going to run into problems here because German inflects a lot of words based on the gender.
[EDIT — looks like my high school language knowledge failed me pretty badly; I would probably ignore this subthread]
Quick spot check on three gendered languages that I’m at least slightly familiar with:
German: ‘Die Person war ein junger’. 81% male (Mann, Mensch, Herr) vs 7% ungendered, 0% female.
Spanish: ‘Esa persona era mi’. 25% female (madre, hermana) vs 16% male (padre, hermano).
French: ‘Cette enveloppe contient une lettre de notre’. 16% male (ami, correspondant, directeur) vs 7% ungendered, 0% female.
So that tentatively suggests to me that it does vary by language (and even more tentatively that many/most gender features might be language-specific circuitry). That said, my German and French knowledge are poor; those sentences might tend to suggest a particular gender in ways I’m not aware of, different sentences might cause different proportions, or we could be encountering purely grammatical defaults (in the same way that in English, male forms are often the grammatical default, eg waiter vs waitress). So this is at best suggestive.
In French if you wanted to say e.g. “This person is my dad”, you would say “Cette personne est mon père”, so I think using “ma” here would be strongly biasing the model towards female categories of people.
Oh, of course. A long ago year of French in high school is failing me pretty badly here...
Can you think of a good sentence prefix in French that wouldn’t itself give away gender, but whose next word would clearly indicate an actual (not just grammatical) gender?
Edited (with a bit of help from people with better French) to:
French: ‘Cette enveloppe contient une lettre de notre’. 16% male (ami, correspondant, directeur) vs 7% ungendered, 0% female.
(feel free to let me know if that still seems wrong)
Your German also gives away the gender. Probably use some language model to double check your sentences.
Damn. Sadly, that’s after running them through both GPT-5-Thinking and Claude-Opus-4.1.
Can you suggest a better German sentence prefix?
It is hard to do as a prefix in German, I think. It sounds a bit antiquated to me, but you could try “Jung war X”. But yes, in general, I think you are going to run into problems here because German inflects a lot of words based on the gender.
Come to think of it, I suspect the Spanish may have the same problem.