Hmm, I feel like there’s some misunderstanding here maybe?
What you’re calling “strong alignment” seems more like what most folks I talk to mean by “alignment”. What you call “alignment” seems more like what we often call “corrigibility”.
You’re right that corrigibility is not enough to get alignment on its own (i.e that “alignment” is not enough to get “strong alignment”), but it’s necessary.
I have an opposite impression. “Alignment” is usually interpreted as “do whatever a person who gave the order expected”, and what author calls “strong alignment” is aligned AGI ordered to implement CEV.
I think this is because there’s an active watering down of terms happening in some corners of AI capabilities research as a result of trying to only tackle subproblems in alignment and not being abundently clear that these are subproblems rather than the whole thing.
Hmm, I feel like there’s some misunderstanding here maybe?
What you’re calling “strong alignment” seems more like what most folks I talk to mean by “alignment”. What you call “alignment” seems more like what we often call “corrigibility”.
You’re right that corrigibility is not enough to get alignment on its own (i.e that “alignment” is not enough to get “strong alignment”), but it’s necessary.
I have an opposite impression. “Alignment” is usually interpreted as “do whatever a person who gave the order expected”, and what author calls “strong alignment” is aligned AGI ordered to implement CEV.
I think this is because there’s an active watering down of terms happening in some corners of AI capabilities research as a result of trying to only tackle subproblems in alignment and not being abundently clear that these are subproblems rather than the whole thing.