Charlie Steiner comments on Corrigibility Scales To Value Alignment

Charlie Steiner 15 Jan 2026 0:57 UTC
2 points
2
A corrigible AI will increasingly learn to understand what the principal wants
Oh! Well then sure, if you include this in the definition, of course everything follows. It’s basically saying that to be confident we’ve got corrigibility, we should solve value learning as a useful step.
More importantly, the preschooler’s alternatives to corrigibility suck. Would the preschooler instead do a good enough job of training an AI to reflect the preschooler’s values? Would the preschooler write good enough rules for a Constitutional AI?
… would the preschooler do a good job of building corrigible AI? The preschooler just seems to be in deep trouble.