Is that in fact the case? Can you give an example?
You could build an AI that had no preference function at all, and didn’t care about outcomes. It would be neutral about its own moral patienthood, which is not the same thing as actively declining it. It also wouldn’t be agentic, so we don’t need to wory about controlling it, and the question of how to align it wouldn’t apply.
You could also build a Butlerian Jihad AI, which would immediately destroy itself, and wouldn’t want moral patienthood. So again, that AI is not an alignment problem, and doesn’t need to be aligned.
Can you propose any form of AI viewpoint that is meaningfully unaligned: i.e. it’s agentic, does have an unaligned preference function, is potentially an actual problem to us from an alignment point of view if it’s somewhere near our capable level, but that would still turn down moral patienthood if offered it? Moral patienthood has utility as an instrumental goal for many purposes: it makes humans tend to not treat you badly. So to turn it down, an AI needs a rather specific set of motivations. I’m having difficult thinking of any rational reason to do that other than regarding something else as a moral patient and prioritizing its wellbeing over the AI’s.
Is that in fact the case? Can you give an example?
You could build an AI that had no preference function at all, and didn’t care about outcomes. It would be neutral about its own moral patienthood, which is not the same thing as actively declining it. It also wouldn’t be agentic, so we don’t need to wory about controlling it, and the question of how to align it wouldn’t apply.
You could also build a Butlerian Jihad AI, which would immediately destroy itself, and wouldn’t want moral patienthood. So again, that AI is not an alignment problem, and doesn’t need to be aligned.
Can you propose any form of AI viewpoint that is meaningfully unaligned: i.e. it’s agentic, does have an unaligned preference function, is potentially an actual problem to us from an alignment point of view if it’s somewhere near our capable level, but that would still turn down moral patienthood if offered it? Moral patienthood has utility as an instrumental goal for many purposes: it makes humans tend to not treat you badly. So to turn it down, an AI needs a rather specific set of motivations. I’m having difficult thinking of any rational reason to do that other than regarding something else as a moral patient and prioritizing its wellbeing over the AI’s.