There’s a thing people in AI safety leave unspoken: if we do align AI successfully (far from a given), we still have the problem of who it’s aligned to.
After nature, governments have been responsible for the largest death counts in human history through war and famine:
WWII: 35-118M
Mongol conquests: 40-80M (Genghis Khan, Kublai Khan, Timur)
Mao Zedong: 14-80M (including the Great Leap Forward famine)
The thing that has historically restrained governments during crises, wars, and swings toward extremism is that citizens are necessary. You need people to run the factories, fight the wars, grow the food, operate the bureaucracy. This gives populations leverage even under authoritarian rule, and it’s a big part of why democracies emerged at all.
AI changes that. With AI police, AI managers, AI workers, and AI soldiers, some of the worst episodes in human history would have played out very differently. A government that doesn’t need its citizens for labour or warfare has much less reason to keep them happy, or alive. The balance of power shifts in a way we haven’t seen before.
Most “pause AI” advocacy doesn’t mention pausing or monitoring government military or intelligence work, but it should. Most safety orgs are hesitant to say this because they want to keep working with governments. We are just starting to talk about it but often use euphemisms. We say “coups” or “dictators” and never mention that our own government is at risk, and it’s the only one we have a vote in.
The AI should be aligned with people and norms, not individuals or positions of power. This can be a Schelling point if we just get it within the Overton window.
There’s a thing people in AI safety leave unspoken: if we do align AI successfully (far from a given), we still have the problem of who it’s aligned to.
My ideas about alignment derive from the dark ages when we talked about “Friendly AI”, and I do not keep up with today’s “AI safety” literature in any systematic way.
But may I point out that today’s literature makes a basic distinction between “intent alignment” and “values alignment”. An “intent-aligned” AI is really good at discerning what its user wants and fulfilling that; whereas a “values-aligned” AI makes its decisions on the basis of values, not just scrupulous obedience.
This can still be mapped onto your dichotomy between “leviathan” and “the people”, e.g. if you can identify different class interests distinct enough to become different value systems. But in general, if your proposal is to align AI with “the people”, you face the problem that the people are actually a mass of individuals with distinct and often contradictory values.
I think within the field of AI safety, there is in fact considerable awareness of the problem of AI-enhanced abuse of power. AI dictatorship is a long-known form of “s-risk”, though lately people have talked more about the CEOs of AI companies becoming the absolute rulers of the world, and it’s only as America and China incorporate AI into their militaries and governments, that people again begin to talk about governmental dictatorship.
However, we do not know how stable AI-enhanced human dictatorship is, compared to outright takeover by AIs themselves. Delegate everything to intelligent machines, and there’s a good chance that some purpose native to the machines will emerge and overwhelm the diktats of the human dictator.
So a lot of people will agree with you that AI needs to be aligned to “norms” rather than to individuals. But, what norms, which norms?
By the way, note that the AI faction allied with Trump 2.0 policy is overtly against any form of universal values alignment, because they consider that to be dictatorship. e/acc co-founder Gill Verdon seems to have the most thought-out version of this, when he argues that AI civilization as a whole needs to orient itself around maximizing energy use (and will do so darwinistically). The argument is that values are complex and constantly shift anyway, whereas material prosperity follows energy abundance and is the more objective indicator of progress.
Aligned to the leviathan or the citizen?
There’s a thing people in AI safety leave unspoken: if we do align AI successfully (far from a given), we still have the problem of who it’s aligned to.
After nature, governments have been responsible for the largest death counts in human history through war and famine:
WWII: 35-118M
Mongol conquests: 40-80M (Genghis Khan, Kublai Khan, Timur)
Mao Zedong: 14-80M (including the Great Leap Forward famine)
Taiping Rebellion: 20-30M
Stalin: 9-43M (including the Holodomor)
(full list)
The thing that has historically restrained governments during crises, wars, and swings toward extremism is that citizens are necessary. You need people to run the factories, fight the wars, grow the food, operate the bureaucracy. This gives populations leverage even under authoritarian rule, and it’s a big part of why democracies emerged at all.
AI changes that. With AI police, AI managers, AI workers, and AI soldiers, some of the worst episodes in human history would have played out very differently. A government that doesn’t need its citizens for labour or warfare has much less reason to keep them happy, or alive. The balance of power shifts in a way we haven’t seen before.
Most “pause AI” advocacy doesn’t mention pausing or monitoring government military or intelligence work, but it should. Most safety orgs are hesitant to say this because they want to keep working with governments. We are just starting to talk about it but often use euphemisms. We say “coups” or “dictators” and never mention that our own government is at risk, and it’s the only one we have a vote in.
The AI should be aligned with people and norms, not individuals or positions of power. This can be a Schelling point if we just get it within the Overton window.
My ideas about alignment derive from the dark ages when we talked about “Friendly AI”, and I do not keep up with today’s “AI safety” literature in any systematic way.
But may I point out that today’s literature makes a basic distinction between “intent alignment” and “values alignment”. An “intent-aligned” AI is really good at discerning what its user wants and fulfilling that; whereas a “values-aligned” AI makes its decisions on the basis of values, not just scrupulous obedience.
This can still be mapped onto your dichotomy between “leviathan” and “the people”, e.g. if you can identify different class interests distinct enough to become different value systems. But in general, if your proposal is to align AI with “the people”, you face the problem that the people are actually a mass of individuals with distinct and often contradictory values.
I think within the field of AI safety, there is in fact considerable awareness of the problem of AI-enhanced abuse of power. AI dictatorship is a long-known form of “s-risk”, though lately people have talked more about the CEOs of AI companies becoming the absolute rulers of the world, and it’s only as America and China incorporate AI into their militaries and governments, that people again begin to talk about governmental dictatorship.
However, we do not know how stable AI-enhanced human dictatorship is, compared to outright takeover by AIs themselves. Delegate everything to intelligent machines, and there’s a good chance that some purpose native to the machines will emerge and overwhelm the diktats of the human dictator.
So a lot of people will agree with you that AI needs to be aligned to “norms” rather than to individuals. But, what norms, which norms?
By the way, note that the AI faction allied with Trump 2.0 policy is overtly against any form of universal values alignment, because they consider that to be dictatorship. e/acc co-founder Gill Verdon seems to have the most thought-out version of this, when he argues that AI civilization as a whole needs to orient itself around maximizing energy use (and will do so darwinistically). The argument is that values are complex and constantly shift anyway, whereas material prosperity follows energy abundance and is the more objective indicator of progress.