I keep running into conceptual confusion around the term “alignment,” particularly when reading older Less Wrong posts. Some people say “aligned AI” and mean “an AI that works for human flourishing,” some people say that an AI “is aligned” if it reliably advances the intended objectives of some person or group (and doesn’t have some secret set of goals / isn’t scheming), and yet other people use “alignment” to mean something along the lines of “the ability of any system to reliably work towards some pre-defined goal.” I usually have to work out which is being said on the spot, which is annoying given that the implications of each are very different.
Is there one commonly accepted definition? Is this confusion just a thing we’ve all accepted?
I’m a little surprised by the amount of disagree reacts, given that no one has replied.