Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H’s goals (whether or not H intends for A to achieve H’s goals)?
Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation?
[Question] What is the alternative to intent alignment called?
Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H’s goals (whether or not H intends for A to achieve H’s goals)?
Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation?
(Intent alignment definition from https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6)