Let’s assume we learn how to “do” alignment.
I am beginning to believe that respect for human self-determination is the only safe alignment target.
Human value systems are highly culture bound and vary vastly even by individual.
There are very few universal taboos and even fewer things that everyone wants.
If an all-powerful AI system is completely aligned with, say, the western worldview, then it may seem like a tyrant to other people who lead sufficiently different lives.
The only reasonable solution is to respect individual difference and refuse to override human choices or values (within limits—if your style is murder obviously that can’t fly).
We have plenty of precedents in pop culture and politics: the “pursuit of happiness” in democratic liberalism, the “prime directive” from Star Trek, our cultural aversions to tactics that rob people of self-determination, like brainwashing, torture or coercion.
our cultural aversions to tactics that rob people of self-determination, like brainwashing, torture or coercion.
And yet, religion remains legal, although to a large degree it is brainwashing people since childhood to be scared of disobeying the religious authorities.
Should human self-determination respecting AI be like: “I will let you follow your religion etc., but if you ask me whether god exists, I will truthfully say no, and I will give the same truthful answer to your children, if they ask”?
Should it allow or prevent killing heretics? What about heretics who have formerly stated explicitly “if I ever deviate from our religion, I want you to kill me publicly, and I want my current wish to override my future heretical wishes”. Would it make a difference if the future heretic at the moment of asking for this is a scared child who believes that god will put him in hell to be tortured for eternity if he does not make this request to the AI?
I conceive of self-determination in terms of wills. The human will is not to be opposed, including the will to see the world in a particular way.
A self-determination-aligned AI may respond to inquiries about sacred beliefs, but may not reshape the asker’s beliefs in an instrumentalist fashion in order to pursue a goal, even if the goal is as noble as truth-spreading. The difference here is emphasis: truth saying versus truth imposing.
A self-determination-aligned AI may more or less directly intervene to prevent death between warring parties, but must not attempt to “re-program” adversaries into peacefulness or impose peace by force. Again, the key difference here is emphasis: value of life versus control.
The AI would refuse to assist human efforts to impose their will on others, but would not oppose the will of human beings to impose their will on others. For example: AIs would prevent a massacre of the Kurds, but would not overthrow Saddam’s government.
In other words, the AI must not simply be another will amongst other wills. It will help, act and respond, but must not seek to control. The human will (including the inner will to hold onto beliefs and values) is to be considered inviolate, except in the very narrow cases where limited and direct action preserves a handful of universal values like preventing unneeded suffering.
Re: your heretic example. If it is possible to directly prevent the murder of the heretic insofar as doing so would be aligned with a nearly universal human value, it should be done. But it must not prevent the murder by violating human self-determination (i.e.; changing beliefs, overthrowing the local government, etc.)
In other words, the AI must maximally avoid opposing human will while enforcing a minimal set of nearly universal values.
Thus the AI’s instrumentalist actions are nearly universally considered beneficial because they are limited to instrumentalist pursuit of nearly universal values, with the escape hatch of changing human values out of scope because of self-determination-alignment.
Re: instructing an AI to not tell your children God isn’t real if they ask. This represents an attempt by the parent to impose their will on the child by proxy of AI. Thus the AI would refuse.
Side note: Prompt responses aligned with human self-determination would get standard refusals (“I cannot help you make a gun”, “I cannot help you write propaganda”) are downstream from self determination alignment.
aligned with, say, the bay area intellectual’s worldview, then it may seem like a tyrant to other people
Unless “bay area intellectual’s worldview” itself respects human self-determination. Even if respect for autonomy could be sufficient almost on its own in some ways, it might also turn out to be a major aspect of most other reasonable alignment targets.
Agreed. Broader point is that perhaps even relatively neutral value systems smuggle in at least some lack of alignment with other value systems. While I think most of the human race could agree on some universal taboos, I think relatively strong guardrails on self-determination should be the default stance, and deference should be front-lined.
I’d go a step further and argue that the sole defining principles of self-determination/autonomy and equality should be applied beyond AI alignment targets to governance and moral systems. I believe what you are referring to in this comment: “refuse to override human choices or values (within limits—if your style is murder obviously that can’t fly)” is the Non-Aggression Principle, often abbreviated to the NAP, which basically states that humans ought to be allowed to do as they please so as long as they do not harm/violate the rights of others.
Let’s assume we learn how to “do” alignment. I am beginning to believe that respect for human self-determination is the only safe alignment target. Human value systems are highly culture bound and vary vastly even by individual. There are very few universal taboos and even fewer things that everyone wants.
If an all-powerful AI system is completely aligned with, say, the western worldview, then it may seem like a tyrant to other people who lead sufficiently different lives. The only reasonable solution is to respect individual difference and refuse to override human choices or values (within limits—if your style is murder obviously that can’t fly). We have plenty of precedents in pop culture and politics: the “pursuit of happiness” in democratic liberalism, the “prime directive” from Star Trek, our cultural aversions to tactics that rob people of self-determination, like brainwashing, torture or coercion.
What even is human self-determination?
And yet, religion remains legal, although to a large degree it is brainwashing people since childhood to be scared of disobeying the religious authorities.
Should human self-determination respecting AI be like: “I will let you follow your religion etc., but if you ask me whether god exists, I will truthfully say no, and I will give the same truthful answer to your children, if they ask”?
Should it allow or prevent killing heretics? What about heretics who have formerly stated explicitly “if I ever deviate from our religion, I want you to kill me publicly, and I want my current wish to override my future heretical wishes”. Would it make a difference if the future heretic at the moment of asking for this is a scared child who believes that god will put him in hell to be tortured for eternity if he does not make this request to the AI?
I conceive of self-determination in terms of wills. The human will is not to be opposed, including the will to see the world in a particular way.
A self-determination-aligned AI may respond to inquiries about sacred beliefs, but may not reshape the asker’s beliefs in an instrumentalist fashion in order to pursue a goal, even if the goal is as noble as truth-spreading. The difference here is emphasis: truth saying versus truth imposing.
A self-determination-aligned AI may more or less directly intervene to prevent death between warring parties, but must not attempt to “re-program” adversaries into peacefulness or impose peace by force. Again, the key difference here is emphasis: value of life versus control.
The AI would refuse to assist human efforts to impose their will on others, but would not oppose the will of human beings to impose their will on others. For example: AIs would prevent a massacre of the Kurds, but would not overthrow Saddam’s government.
In other words, the AI must not simply be another will amongst other wills. It will help, act and respond, but must not seek to control. The human will (including the inner will to hold onto beliefs and values) is to be considered inviolate, except in the very narrow cases where limited and direct action preserves a handful of universal values like preventing unneeded suffering.
Re: your heretic example. If it is possible to directly prevent the murder of the heretic insofar as doing so would be aligned with a nearly universal human value, it should be done. But it must not prevent the murder by violating human self-determination (i.e.; changing beliefs, overthrowing the local government, etc.)
In other words, the AI must maximally avoid opposing human will while enforcing a minimal set of nearly universal values.
Thus the AI’s instrumentalist actions are nearly universally considered beneficial because they are limited to instrumentalist pursuit of nearly universal values, with the escape hatch of changing human values out of scope because of self-determination-alignment.
Re: instructing an AI to not tell your children God isn’t real if they ask. This represents an attempt by the parent to impose their will on the child by proxy of AI. Thus the AI would refuse.
Side note: Prompt responses aligned with human self-determination would get standard refusals (“I cannot help you make a gun”, “I cannot help you write propaganda”) are downstream from self determination alignment.
I like it. But I am afraid the obvious next step is that the parent will ban the child from using the AI.
Probably. But the AI must not try to stop the parent from doing so, because this would mean opposing the will of the parent.
Unless “bay area intellectual’s worldview” itself respects human self-determination. Even if respect for autonomy could be sufficient almost on its own in some ways, it might also turn out to be a major aspect of most other reasonable alignment targets.
Agreed. Broader point is that perhaps even relatively neutral value systems smuggle in at least some lack of alignment with other value systems. While I think most of the human race could agree on some universal taboos, I think relatively strong guardrails on self-determination should be the default stance, and deference should be front-lined.
I’d go a step further and argue that the sole defining principles of self-determination/autonomy and equality should be applied beyond AI alignment targets to governance and moral systems. I believe what you are referring to in this comment: “refuse to override human choices or values (within limits—if your style is murder obviously that can’t fly)” is the Non-Aggression Principle, often abbreviated to the NAP, which basically states that humans ought to be allowed to do as they please so as long as they do not harm/violate the rights of others.