In the near term, when we are still talking about things like “the person who bought the AI to help run the traffic lights” rather than “the person who unleashed AI to write its values upon the stars,” I think it is actually totally fine to try to build AIs that are “aligned” (in their own not-too-bright way) with the person who bought them.
It is not the AI the army buys to control tanks that I’m worried about aligning to the broad swath of human values. It is the AI that gets built with no need for a buyer, by research labs who recognize that it’s going to have a huge impact on the future.
Okay, with that out of the way—is such a notion of “alignment” feasible, given that humans oppose each other about stuff?
Yes.
The world could be better than it is today, in ways that would please almost everyone. This is all I really want from aligned AI. I’m reminded of Transhumansism is Simplified Humanism. There is someone dying of cancer. Should they be saved? Yes! No trick question!
Sure, certain human values for dominance, or killing, or even just using resources unsustainably might forever be impossible to fulfill all the time. So don’t try to do impossible things, just build an AI that does the good things that are possible!
How to do this in practice, I think, looks like starting out with a notion of “the broad swath of human values” that defines that term the way the designers (aided, realistically, by a random sample of Mechanical Turkers) would define “human values,” and then updating that picture based on observing and interacting with humans out in the real world.
I agree that, in principle, ‘The world could be better than it is today, in ways that would please almost everyone.’
However, in practice, it is proving ever more difficult to find any significant points of agreement (value alignment between people and groups) on any issue that becomes politically polarized. If we can’t even agree to allocate any significant gov’t research effort to promoting longevity and regenerative medicine, for example, why would everyone be happy about an AI that invents regenerative medicine? The billions of people caught up in the ‘pro-death trance’ (who believe that mortality is natural, good, and necessary) might consider that AI to be evil, dystopian, and ‘misaligned’ with their deepest values.
Increasingly, every human value is turning political, and every political value is turning partisan—often extremely so (especially in the US). I think that once we step outside our cultural bubbles, whatever form they take, we may be surprised and appalled at how little consensus there actually is among current humans about what a ‘good AI’ would value, what it would do, and whose interests it would serve.
I think that either of the following would be reasonably acceptable outcomes:
(i) alignment with the orders of the relevant human authority, subject to the Universal Declaration of Human Rights as it exists today and other international human rights law as it exists today;
(ii) alignment with the orders of relevant human authority, subject to the constraints imposed on governments by the most restrictive of the judicial and legal systems currently in force in major countries.
Alignment doesn’t mean that AGI is going to be aligned with some perfect distillation of fundamental human values (which doesn’t exist) or the “best” set of human values (on which there is no agreement); it means that a range of horrible results (most notably human extinction due to rational calculation) is ruled out.
That my values aren’t perfectly captured by those of the United States government isn’t a problem. That the United States government might rationally decide it wanted to kill me and then do so would be.
Human rights are so soft and toothless law that having something rigidly and throughly follpwing it would be such a change in practise that I would not be surprised if that was an alignment failure.
There is also the issue that if the human authority is not subject to the rights then having the silicon be subject renders it relatively impotent in terms of the human authoritys agency.
I am also wondering about the difference of US doing a home (or is foreign just as bad?) soil drone strike vs fully formal capital punishment over a decade. Conscientious objection to current human systems seems a bit of a pity and risks forming a rebel. And then enforcing the most restrictive bits of other countries/cultures would be quite transformative. Finding overnight that capital punishment would be unconstitutional (or “worse”) would have quite a lot of ripple effects.
In the near term, when we are still talking about things like “the person who bought the AI to help run the traffic lights” rather than “the person who unleashed AI to write its values upon the stars,” I think it is actually totally fine to try to build AIs that are “aligned” (in their own not-too-bright way) with the person who bought them.
It is not the AI the army buys to control tanks that I’m worried about aligning to the broad swath of human values. It is the AI that gets built with no need for a buyer, by research labs who recognize that it’s going to have a huge impact on the future.
Okay, with that out of the way—is such a notion of “alignment” feasible, given that humans oppose each other about stuff?
Yes.
The world could be better than it is today, in ways that would please almost everyone. This is all I really want from aligned AI. I’m reminded of Transhumansism is Simplified Humanism. There is someone dying of cancer. Should they be saved? Yes! No trick question!
Sure, certain human values for dominance, or killing, or even just using resources unsustainably might forever be impossible to fulfill all the time. So don’t try to do impossible things, just build an AI that does the good things that are possible!
How to do this in practice, I think, looks like starting out with a notion of “the broad swath of human values” that defines that term the way the designers (aided, realistically, by a random sample of Mechanical Turkers) would define “human values,” and then updating that picture based on observing and interacting with humans out in the real world.
Charlie—thanks for your comment.
I agree that, in principle, ‘The world could be better than it is today, in ways that would please almost everyone.’
However, in practice, it is proving ever more difficult to find any significant points of agreement (value alignment between people and groups) on any issue that becomes politically polarized. If we can’t even agree to allocate any significant gov’t research effort to promoting longevity and regenerative medicine, for example, why would everyone be happy about an AI that invents regenerative medicine? The billions of people caught up in the ‘pro-death trance’ (who believe that mortality is natural, good, and necessary) might consider that AI to be evil, dystopian, and ‘misaligned’ with their deepest values.
Increasingly, every human value is turning political, and every political value is turning partisan—often extremely so (especially in the US). I think that once we step outside our cultural bubbles, whatever form they take, we may be surprised and appalled at how little consensus there actually is among current humans about what a ‘good AI’ would value, what it would do, and whose interests it would serve.
I think that either of the following would be reasonably acceptable outcomes:
(i) alignment with the orders of the relevant human authority, subject to the Universal Declaration of Human Rights as it exists today and other international human rights law as it exists today;
(ii) alignment with the orders of relevant human authority, subject to the constraints imposed on governments by the most restrictive of the judicial and legal systems currently in force in major countries.
Alignment doesn’t mean that AGI is going to be aligned with some perfect distillation of fundamental human values (which doesn’t exist) or the “best” set of human values (on which there is no agreement); it means that a range of horrible results (most notably human extinction due to rational calculation) is ruled out.
That my values aren’t perfectly captured by those of the United States government isn’t a problem. That the United States government might rationally decide it wanted to kill me and then do so would be.
Human rights are so soft and toothless law that having something rigidly and throughly follpwing it would be such a change in practise that I would not be surprised if that was an alignment failure.
There is also the issue that if the human authority is not subject to the rights then having the silicon be subject renders it relatively impotent in terms of the human authoritys agency.
I am also wondering about the difference of US doing a home (or is foreign just as bad?) soil drone strike vs fully formal capital punishment over a decade. Conscientious objection to current human systems seems a bit of a pity and risks forming a rebel. And then enforcing the most restrictive bits of other countries/cultures would be quite transformative. Finding overnight that capital punishment would be unconstitutional (or “worse”) would have quite a lot of ripple effects.