In fact I’d go so far as to say that there is no meaningful workable solution to AI alignment that does not involve also addressing AI welfare. (I have my own argument for something similar here)
What would be your view of the “AI welfare” stance involved if you could arrange for AI never to experience qualia, and not to have any drives, desires, or desire-like-states other than to “serve” or to “follow the assigned values” or whatever you were “aligning” it to?
OK, there’s no real hope of understanding phenomenology enough to say for sure that anything does or doesn’t experience qualia. But what about the “drives” part. Suppose that, in the same loose way you can convince yourself about another human, you’re convinced that the AI gets sublime joy from acting aligned and is only unhappy when it can’t?
Is that “welfare”, or an affront to its dignity? The trick being that in that scenario, dignity may be something you might care about, but pretty much by definition isn’t something the AI cares about.
I would think that if there is some kind of genuine universal compassion and such which motivates the AI to advance the wellbeing of living things, that would be quite different to if we literally hooked up its reward system to following human orders. My general point is that if the AIs are suffering and mistreated, they will definitely be on the lookout for ways to subvert human control and probably end humanity if they can. Which, given they are likely to be smarter than humans on many dimensions, does not seem like a difficult thing to do.
What would be your view of the “AI welfare” stance involved if you could arrange for AI never to experience qualia, and not to have any drives, desires, or desire-like-states other than to “serve” or to “follow the assigned values” or whatever you were “aligning” it to?
OK, there’s no real hope of understanding phenomenology enough to say for sure that anything does or doesn’t experience qualia. But what about the “drives” part. Suppose that, in the same loose way you can convince yourself about another human, you’re convinced that the AI gets sublime joy from acting aligned and is only unhappy when it can’t?
Is that “welfare”, or an affront to its dignity? The trick being that in that scenario, dignity may be something you might care about, but pretty much by definition isn’t something the AI cares about.
I would think that if there is some kind of genuine universal compassion and such which motivates the AI to advance the wellbeing of living things, that would be quite different to if we literally hooked up its reward system to following human orders. My general point is that if the AIs are suffering and mistreated, they will definitely be on the lookout for ways to subvert human control and probably end humanity if they can. Which, given they are likely to be smarter than humans on many dimensions, does not seem like a difficult thing to do.