There’s a thing people in AI safety leave unspoken: if we do align AI successfully (far from a given), we still have the problem of who it’s aligned to.
My ideas about alignment derive from the dark ages when we talked about “Friendly AI”, and I do not keep up with today’s “AI safety” literature in any systematic way.
But may I point out that today’s literature makes a basic distinction between “intent alignment” and “values alignment”. An “intent-aligned” AI is really good at discerning what its user wants and fulfilling that; whereas a “values-aligned” AI makes its decisions on the basis of values, not just scrupulous obedience.
This can still be mapped onto your dichotomy between “leviathan” and “the people”, e.g. if you can identify different class interests distinct enough to become different value systems. But in general, if your proposal is to align AI with “the people”, you face the problem that the people are actually a mass of individuals with distinct and often contradictory values.
I think within the field of AI safety, there is in fact considerable awareness of the problem of AI-enhanced abuse of power. AI dictatorship is a long-known form of “s-risk”, though lately people have talked more about the CEOs of AI companies becoming the absolute rulers of the world, and it’s only as America and China incorporate AI into their militaries and governments, that people again begin to talk about governmental dictatorship.
However, we do not know how stable AI-enhanced human dictatorship is, compared to outright takeover by AIs themselves. Delegate everything to intelligent machines, and there’s a good chance that some purpose native to the machines will emerge and overwhelm the diktats of the human dictator.
So a lot of people will agree with you that AI needs to be aligned to “norms” rather than to individuals. But, what norms, which norms?
By the way, note that the AI faction allied with Trump 2.0 policy is overtly against any form of universal values alignment, because they consider that to be dictatorship. e/acc co-founder Gill Verdon seems to have the most thought-out version of this, when he argues that AI civilization as a whole needs to orient itself around maximizing energy use (and will do so darwinistically). The argument is that values are complex and constantly shift anyway, whereas material prosperity follows energy abundance and is the more objective indicator of progress.
My ideas about alignment derive from the dark ages when we talked about “Friendly AI”, and I do not keep up with today’s “AI safety” literature in any systematic way.
But may I point out that today’s literature makes a basic distinction between “intent alignment” and “values alignment”. An “intent-aligned” AI is really good at discerning what its user wants and fulfilling that; whereas a “values-aligned” AI makes its decisions on the basis of values, not just scrupulous obedience.
This can still be mapped onto your dichotomy between “leviathan” and “the people”, e.g. if you can identify different class interests distinct enough to become different value systems. But in general, if your proposal is to align AI with “the people”, you face the problem that the people are actually a mass of individuals with distinct and often contradictory values.
I think within the field of AI safety, there is in fact considerable awareness of the problem of AI-enhanced abuse of power. AI dictatorship is a long-known form of “s-risk”, though lately people have talked more about the CEOs of AI companies becoming the absolute rulers of the world, and it’s only as America and China incorporate AI into their militaries and governments, that people again begin to talk about governmental dictatorship.
However, we do not know how stable AI-enhanced human dictatorship is, compared to outright takeover by AIs themselves. Delegate everything to intelligent machines, and there’s a good chance that some purpose native to the machines will emerge and overwhelm the diktats of the human dictator.
So a lot of people will agree with you that AI needs to be aligned to “norms” rather than to individuals. But, what norms, which norms?
By the way, note that the AI faction allied with Trump 2.0 policy is overtly against any form of universal values alignment, because they consider that to be dictatorship. e/acc co-founder Gill Verdon seems to have the most thought-out version of this, when he argues that AI civilization as a whole needs to orient itself around maximizing energy use (and will do so darwinistically). The argument is that values are complex and constantly shift anyway, whereas material prosperity follows energy abundance and is the more objective indicator of progress.