A lot of current safety research is on intent alignment—ensuring that AI learns and seeks to fulfill the intentions of one human user, or of a small group of human users. Since the prospect of developing an AI assistant may be attractive to researchers and lucrative for companies, this type of alignment research may be prioritized over research that tackles the unpleasant realities of balancing the sometimes opposing interests of various demographics of humans.
In particular, I can imagine a failure scenario in which a small number of corporations use AI to promote their own influence and the growth of their platforms. The interests of the top users of the platforms are prioritized above others, and life genuinely improves for them. Some users want to improve life for others around them, but this is largely focused on their immediate community, so AI-assisted humans prioritize the development of already-wealthy countries while doing very little to help the global poor or solve problems like climate change, which only worsen. Cures for diseases are created, new technologies are invented, and a few people enjoy profound leisure and luxury, but while life gets better for them, it worsens for people whose interests aren’t represented on the platforms—which includes most of humanity.
Meanwhile, AI subtly discourages more people from joining the platform, finding that one way to satisfy its humans is to make them feel lucky and superior to others. Countries that were reliant on rich nations for outsourced labor are thrown into disarray as automation accelerates. Food insecurity and starvation loom, worsened by the disruption of agriculture by climate change. Bioengineered diseases are released by human actors, and although AI helps develop countermeasures to keep the death toll to a few million, this results in widespread panic and fear in poorer regions, while rapid responses and new technologies limit their influence in rich nations. Constant surveillance and behavioral engineering prevents crime and discourages the coordinated expression of dissent. Most of humanity feels powerless, dehumanized, and alone, while a tiny elite is on track to get the utopia they dreamed of. In the back of their minds, they know that not everyone shares their life of luxury—oh, how lucky they are—but they’re confident it’s just a matter of time.
The scenario I just described represents the main cluster of outcomes that I’m most worried about, although my views will surely change over time. The future above, which might be classified as an s-risk, is terrifying to me because it wouldn’t fit the definition of an existential catastrophe by the best evidence available to elite humans—people’s lives would be getting better by all sorts of metrics, but only ones that elites cared about. It would be a tragic case of Goodhart’s Law and the principal-agent problem gone wrong.
A lot of current safety research is on intent alignment—ensuring that AI learns and seeks to fulfill the intentions of one human user, or of a small group of human users. Since the prospect of developing an AI assistant may be attractive to researchers and lucrative for companies, this type of alignment research may be prioritized over research that tackles the unpleasant realities of balancing the sometimes opposing interests of various demographics of humans.
In particular, I can imagine a failure scenario in which a small number of corporations use AI to promote their own influence and the growth of their platforms. The interests of the top users of the platforms are prioritized above others, and life genuinely improves for them. Some users want to improve life for others around them, but this is largely focused on their immediate community, so AI-assisted humans prioritize the development of already-wealthy countries while doing very little to help the global poor or solve problems like climate change, which only worsen. Cures for diseases are created, new technologies are invented, and a few people enjoy profound leisure and luxury, but while life gets better for them, it worsens for people whose interests aren’t represented on the platforms—which includes most of humanity.
Meanwhile, AI subtly discourages more people from joining the platform, finding that one way to satisfy its humans is to make them feel lucky and superior to others. Countries that were reliant on rich nations for outsourced labor are thrown into disarray as automation accelerates. Food insecurity and starvation loom, worsened by the disruption of agriculture by climate change. Bioengineered diseases are released by human actors, and although AI helps develop countermeasures to keep the death toll to a few million, this results in widespread panic and fear in poorer regions, while rapid responses and new technologies limit their influence in rich nations. Constant surveillance and behavioral engineering prevents crime and discourages the coordinated expression of dissent. Most of humanity feels powerless, dehumanized, and alone, while a tiny elite is on track to get the utopia they dreamed of. In the back of their minds, they know that not everyone shares their life of luxury—oh, how lucky they are—but they’re confident it’s just a matter of time.
The scenario I just described represents the main cluster of outcomes that I’m most worried about, although my views will surely change over time. The future above, which might be classified as an s-risk, is terrifying to me because it wouldn’t fit the definition of an existential catastrophe by the best evidence available to elite humans—people’s lives would be getting better by all sorts of metrics, but only ones that elites cared about. It would be a tragic case of Goodhart’s Law and the principal-agent problem gone wrong.