My headcanon is that there are two levels of alignment :
Technical alignment : you get an AI that does what you ask it to do, without any shenanigans (a bit more precisely : without any short-term/medium-term side-effect that, should you know that side-effect beforehand, would cause you to refuse to do the thing in the first place). Typical misalignment at this level : hidden complexity of wishes (or, you know, no alignment at all, like clippy)
Comprehensive alignment : you get an AI that does what the CEV-you wants. Typical misalignment : just ask a technically-aligned AI some heavily social-desirability-biased outcome, solve for equilibrium, get close to 0 value remaining in the universe.
But yeah, I don’t think that distinction has got enough discussion.
(there’s also a third level, where CEV-you wishes also goes to essentially 0 value for current-you, but let’s not get there)
My headcanon is that there are two levels of alignment :
Technical alignment : you get an AI that does what you ask it to do, without any shenanigans (a bit more precisely : without any short-term/medium-term side-effect that, should you know that side-effect beforehand, would cause you to refuse to do the thing in the first place). Typical misalignment at this level : hidden complexity of wishes (or, you know, no alignment at all, like clippy)
Comprehensive alignment : you get an AI that does what the CEV-you wants. Typical misalignment : just ask a technically-aligned AI some heavily social-desirability-biased outcome, solve for equilibrium, get close to 0 value remaining in the universe.
But yeah, I don’t think that distinction has got enough discussion.
(there’s also a third level, where CEV-you wishes also goes to essentially 0 value for current-you, but let’s not get there)