Currently, AGI is mostly being developed by human engineers and scientists within human social systems. [...] There are far fewer literature professors, historians, anthropologists, creatives, social workers, landscape design architects, restaurant workers, farmers, etc., who are intimately involved in creating AGI. This isn’t surprising or illogical, but if AI is likely to be useful to “everyone” in some way (à la radio, computers), then “everyone” probably needs to be involved.
This concern seems somewhat misdirected.
There weren’t a lot of landscape design architects or farmers involved in the development of radio or computers. It was done by engineers, product managers, technology hobbyists, research scientists, logicians, etc.; along with economic demand from commerce, military, and other users capable of specifying their needs and paying the engineers etc. to do it.
Were landscape architects excluded from developing radio? Did anyone prevent farmers from developing computers? No, they were just busy doing landscape design and farming. Eventually someone built computer systems for architects and farmers to use to get more architecting and farming done.
And then the product managers and sales people made sure that they charged the architects and farmers a butt-ton of money. Downstream of that is why both the farmers and the open-source folks have a problem with John Deere’s licensing and enforcement practices; and the architects ain’t particularly thrilled by Autodesk’s behavior either.
This is an important point that I needed to be much clearer about—thank you. I’ll try to be more explicit:
First, AGI is not the same as tech historically, where you’re making tools and solving for PMF. AGI is distinct, and my radio/computers analogy muddled this point. Radios didn’t inherit the worldviews of Marconi etc., transistors didn’t generalize the moral intuitions of the Bell Labs engineers. Basically, these tools weren’t absorbing and learning values, so where they solved for PMF, AGI is solving for alignment. AGI learns behavioral priors directly from human judgments (RLHF, reward modeling, etc.) and internalizes/represents the structures of the data and norms it’s trained on. It forms generalizations about concepts like “harm” or “helpfulness” or “fairness” and so forth from those inputs and scales and deploys those generalizations to users, domains, cultures that are way beyond those of its creators’.
So the early value inputs (who provides them, what perspectives the represent, what blind spots they have, etc.) aren’t incidental. And when they become the systems’ default behavior it could be pretty hard to unwind or reshape later for better “PMF”/alignment to new demographics. So absolutely, by using a list of professions to make this point, I definitely minimized the issue and made it feel like a capitalistic / labor force-oriented problem. My deeper point is that there are a lot of people who don’t share Silicon Valley’s belief systems and cultural norms who will be pretty significantly impacted by alignment decisions in AGI.
Diversification really matters because of this. But it’s not at all about having a bunch of farmers learn to code and build transformers—that’s silly. It’s about preventing value lock-in from a narrow slice of people involved in the early decision-making process about what actually matters in alignment.
Currently, alignment (in the US) involves small RLHF annotator groups, safety team preferences from a concentrated selection of tech hubs, SV product-grounded assumptions about what “harm” means and what’s an “acceptable risk” or “good behavior”… Engineers are highly educated and often decently wealthy, with their own cultural norms and value systems of what matters (i.e., what’s “good” or “cool” or “interesting” and what’s not). This isn’t a bad thing, and this is absolutely not a criticism of engineers and how important their perspectives are in shaping AGI! My point is just that they only represent a part of the full distribution that would ideally be reflected in AGI alignment—not bad just narrow.
It’s not just about fairness or optics either, it’s a direct safety issue as well as a limitation to actually achieving AGI. The people currently building these systems have blind spots, possibly brittle, monoculture assumptions, etc., whereas broader representation would help mitigate those risks and catch unanticipated failure modes when the system interacts with radically different human contexts downstream. That’s where I was pointing to the historical precedents… i.e., medicine built around male physiology = persistent harm to women.
And I totally agree with you that capitalism for sure plays a big role here, and aligning to “number go up” is a real risk. It’s the context in which AGI is being built. That’s part of the problem but not the whole problem. Even if we removed capitalism entirely, you’d still have the safety issue and potentially brittle systems due to narrow input distributions (in terms of the broader, system-level decisions and assumptions). And the context AGI is being built in is actually part of my point too. Today, it’s being built within a context driven by capitalistic forces. But don’t we want AGI alignment that isn’t constrained to working in only one type of socioeconomic system, that could adapt to regime changes in a positive and non-scary way?
So to me it’s an “and”—we should examine and worry about alignment in terms of context (capitalism) and in terms of who decides what matters.
I strongly suspect that representation is achieved by pretraining the systems on large datasets including nearly all worldviews that weren’t filtered away by the data selection teams. Issues like the incident with Gemini putting Black people everywhere or Sonnet 4.5′s fiction making both couples gay are likely to originate in the pro-diversity bias of SOTA labs and Western training data. For comparison, earlier versions of DeepSeek would align with the language’s political views (e.g., when asked in Russian without the web search, they would call the Russian invasion SVO or agree with Russia’s anti-drug stance). Later versions have arguably learned to take West-originated scientific results as granted.
I don’t think this is due to a pro-diversity bias, but is simply due to this being extremely popular in easily available stories: https://archiveofourown.org/works/68352911/chapters/176886216 (9 of the top 10 pairings are M/M, each with 40k+ stories; for reference Project Gutenberg only has about 76,000 books total). I think this is due to M/M romance being a superstimulus for female sexuality in a similar way to how lesbian porn is a superstimulus for male sexuality.
Hmm, I tried and failed to reproduce the effect of Claude writing about gays: asking Claude itself to write a story in Tomas B.’s style had it produce a story with no gay couples. Nor did I manage to elicit this quirk from DeepSeek, the Chinese AI (whom I asked two times), from Grok 4 (who generated a story containing the phrase “Sarah, who’d left me for a guy”), GPT-5.1-thinking (whose story had a male and female character and no gays). What I don’t understand is how the bias was eliminated.
Yes great examples of how training data that supports alignment goals matters. But the model’s behaviors are also shaped by RL, SFT, safety filters/inference-time policies, etc., and it will be important to get those right too.
This concern seems somewhat misdirected.
There weren’t a lot of landscape design architects or farmers involved in the development of radio or computers. It was done by engineers, product managers, technology hobbyists, research scientists, logicians, etc.; along with economic demand from commerce, military, and other users capable of specifying their needs and paying the engineers etc. to do it.
Were landscape architects excluded from developing radio? Did anyone prevent farmers from developing computers? No, they were just busy doing landscape design and farming. Eventually someone built computer systems for architects and farmers to use to get more architecting and farming done.
And then the product managers and sales people made sure that they charged the architects and farmers a butt-ton of money. Downstream of that is why both the farmers and the open-source folks have a problem with John Deere’s licensing and enforcement practices; and the architects ain’t particularly thrilled by Autodesk’s behavior either.
You can’t align AGI with the CEV of engineers to the exclusion of other humans, because engineers are not that different from other humans. That’s not the problem. But aligning AGI with “number go up” to the exclusion of other human values, that’s a problem. Even people who like capitalism don’t tend to believe that capitalism is aligned with all human values. That’s something to worry about.
This is an important point that I needed to be much clearer about—thank you. I’ll try to be more explicit:
First, AGI is not the same as tech historically, where you’re making tools and solving for PMF. AGI is distinct, and my radio/computers analogy muddled this point. Radios didn’t inherit the worldviews of Marconi etc., transistors didn’t generalize the moral intuitions of the Bell Labs engineers. Basically, these tools weren’t absorbing and learning values, so where they solved for PMF, AGI is solving for alignment. AGI learns behavioral priors directly from human judgments (RLHF, reward modeling, etc.) and internalizes/represents the structures of the data and norms it’s trained on. It forms generalizations about concepts like “harm” or “helpfulness” or “fairness” and so forth from those inputs and scales and deploys those generalizations to users, domains, cultures that are way beyond those of its creators’.
So the early value inputs (who provides them, what perspectives the represent, what blind spots they have, etc.) aren’t incidental. And when they become the systems’ default behavior it could be pretty hard to unwind or reshape later for better “PMF”/alignment to new demographics. So absolutely, by using a list of professions to make this point, I definitely minimized the issue and made it feel like a capitalistic / labor force-oriented problem. My deeper point is that there are a lot of people who don’t share Silicon Valley’s belief systems and cultural norms who will be pretty significantly impacted by alignment decisions in AGI.
Diversification really matters because of this. But it’s not at all about having a bunch of farmers learn to code and build transformers—that’s silly. It’s about preventing value lock-in from a narrow slice of people involved in the early decision-making process about what actually matters in alignment.
Currently, alignment (in the US) involves small RLHF annotator groups, safety team preferences from a concentrated selection of tech hubs, SV product-grounded assumptions about what “harm” means and what’s an “acceptable risk” or “good behavior”… Engineers are highly educated and often decently wealthy, with their own cultural norms and value systems of what matters (i.e., what’s “good” or “cool” or “interesting” and what’s not). This isn’t a bad thing, and this is absolutely not a criticism of engineers and how important their perspectives are in shaping AGI! My point is just that they only represent a part of the full distribution that would ideally be reflected in AGI alignment—not bad just narrow.
It’s not just about fairness or optics either, it’s a direct safety issue as well as a limitation to actually achieving AGI. The people currently building these systems have blind spots, possibly brittle, monoculture assumptions, etc., whereas broader representation would help mitigate those risks and catch unanticipated failure modes when the system interacts with radically different human contexts downstream. That’s where I was pointing to the historical precedents… i.e., medicine built around male physiology = persistent harm to women.
And I totally agree with you that capitalism for sure plays a big role here, and aligning to “number go up” is a real risk. It’s the context in which AGI is being built. That’s part of the problem but not the whole problem. Even if we removed capitalism entirely, you’d still have the safety issue and potentially brittle systems due to narrow input distributions (in terms of the broader, system-level decisions and assumptions). And the context AGI is being built in is actually part of my point too. Today, it’s being built within a context driven by capitalistic forces. But don’t we want AGI alignment that isn’t constrained to working in only one type of socioeconomic system, that could adapt to regime changes in a positive and non-scary way?
So to me it’s an “and”—we should examine and worry about alignment in terms of context (capitalism) and in terms of who decides what matters.
I strongly suspect that representation is achieved by pretraining the systems on large datasets including nearly all worldviews that weren’t filtered away by the data selection teams. Issues like the incident with Gemini putting Black people everywhere or Sonnet 4.5′s fiction making both couples gay are likely to originate in the pro-diversity bias of SOTA labs and Western training data. For comparison, earlier versions of DeepSeek would align with the language’s political views (e.g., when asked in Russian without the web search, they would call the Russian invasion SVO or agree with Russia’s anti-drug stance). Later versions have arguably learned to take West-originated scientific results as granted.
I don’t think this is due to a pro-diversity bias, but is simply due to this being extremely popular in easily available stories: https://archiveofourown.org/works/68352911/chapters/176886216 (9 of the top 10 pairings are M/M, each with 40k+ stories; for reference Project Gutenberg only has about 76,000 books total). I think this is due to M/M romance being a superstimulus for female sexuality in a similar way to how lesbian porn is a superstimulus for male sexuality.
The pro-diversity bias’ main influence seems to be changing the proportion of stories focused on non-white male/male pairings, as you can see here: https://archiveofourown.org/works/27420499/chapters/68826984
Hmm, I tried and failed to reproduce the effect of Claude writing about gays: asking Claude itself to write a story in Tomas B.’s style had it produce a story with no gay couples. Nor did I manage to elicit this quirk from DeepSeek, the Chinese AI (whom I asked two times), from Grok 4 (who generated a story containing the phrase “Sarah, who’d left me for a guy”), GPT-5.1-thinking (whose story had a male and female character and no gays).
What I don’t understand is how the bias was eliminated.
Yes great examples of how training data that supports alignment goals matters. But the model’s behaviors are also shaped by RL, SFT, safety filters/inference-time policies, etc., and it will be important to get those right too.