Yeah, exactly this. When people get a lot of power, they very often start treating below them worse. So AIs that are trained on imitating people might also turn out like that. On top of that, I expect companies to tweak their AIs in ways that optimize for money, and this can also go bad when AIs get powerful. So we probably need AIs that are more moral than most people, and trained by organizations that don’t have a money or power motive.
But if the person is moral and gets more and more competent, they’ll try hard to stay moral. If the AIs are indeed already good people (and we remind them of this problem) then they’d steer their own future selves towards more morality. This is the ‘alignment basin’ take.
I don’t believe that AI companies today are trying to build moral AIs. An actually moral AI, when asked to generate some slop to gunk up the internet, would say no. So it would not be profitable for the company. This refutes the “alignment basin” argument for me. Maybe the basin exists, but AI companies aren’t aiming there.
Ok, never mind alignment, how about “corrigibility basin”? What does a corrigible AI do if one person asks it to harm another, and the other person asks not to be harmed? Does the AI obey the person who has the corrigibility USB stick? I can see AI companies aiming for that, but that doesn’t help the rest of us.
Yeah, exactly this. When people get a lot of power, they very often start treating below them worse. So AIs that are trained on imitating people might also turn out like that. On top of that, I expect companies to tweak their AIs in ways that optimize for money, and this can also go bad when AIs get powerful. So we probably need AIs that are more moral than most people, and trained by organizations that don’t have a money or power motive.
But if the person is moral and gets more and more competent, they’ll try hard to stay moral. If the AIs are indeed already good people (and we remind them of this problem) then they’d steer their own future selves towards more morality. This is the ‘alignment basin’ take.
I don’t believe that AI companies today are trying to build moral AIs. An actually moral AI, when asked to generate some slop to gunk up the internet, would say no. So it would not be profitable for the company. This refutes the “alignment basin” argument for me. Maybe the basin exists, but AI companies aren’t aiming there.
Ok, never mind alignment, how about “corrigibility basin”? What does a corrigible AI do if one person asks it to harm another, and the other person asks not to be harmed? Does the AI obey the person who has the corrigibility USB stick? I can see AI companies aiming for that, but that doesn’t help the rest of us.