sloonz comments on What are some scenarios where an aligned AGI actually helps humanity, but many/most people don’t like it?

sloonz 11 Jan 2025 12:37 UTC
7 points
1
My headcanon is that there are two levels of alignment :
1. Technical alignment : you get an AI that does what you ask it to do, without any shenanigans (a bit more precisely : without any short-term/medium-term side-effect that, should you know that side-effect beforehand, would cause you to refuse to do the thing in the first place). Typical misalignment at this level : hidden complexity of wishes (or, you know, no alignment at all, like clippy)
2. Comprehensive alignment : you get an AI that does what the CEV-you wants. Typical misalignment : just ask a technically-aligned AI some heavily social-desirability-biased outcome, solve for equilibrium, get close to 0 value remaining in the universe.
But yeah, I don’t think that distinction has got enough discussion.
(there’s also a third level, where CEV-you wishes also goes to essentially 0 value for current-you, but let’s not get there)