“Alignment,” used in a way where current AI is aligned—a sort of “it does basically what we want, within its capabilities, with some occasional mistakes that don’t cause much harm” sort of alignment—is simply easier at lower capabilities, where humans can do a relatively good job of overseeing the AI, not just in deployment but also during training. Systematic flaws in human oversight during training leads (under current paradigms) to misaligned AI.
I’m big on point #2 feeding into point #1.
“Alignment,” used in a way where current AI is aligned—a sort of “it does basically what we want, within its capabilities, with some occasional mistakes that don’t cause much harm” sort of alignment—is simply easier at lower capabilities, where humans can do a relatively good job of overseeing the AI, not just in deployment but also during training. Systematic flaws in human oversight during training leads (under current paradigms) to misaligned AI.