Charlie Steiner comments on Why do misalignment risks increase as AIs get more capable?

Charlie Steiner 11 Apr 2025 4:20 UTC
LW: 2 AF: 2
0
AF
I’m big on point #2 feeding into point #1.
“Alignment,” used in a way where current AI is aligned—a sort of “it does basically what we want, within its capabilities, with some occasional mistakes that don’t cause much harm” sort of alignment—is simply easier at lower capabilities, where humans can do a relatively good job of overseeing the AI, not just in deployment but also during training. Systematic flaws in human oversight during training leads (under current paradigms) to misaligned AI.