If by “corrigible” we mean “the AI will cooperate with all self-modifications we want it to”, then no to 1 and yes to 2. But if you have an AI built by someone who assures you it’s corrigible, but who only had corrigibility w.r.t values/axiology in mind, then you might get yes to 1 and/or no to 2.
Does it mean that the AIs who resisted have never been true Scotsmen truly corrigible in the first place? Or that it becomes far more difficult to make the AIs actually corrigible?
Yup, I see this as placing an additional constraint on what we need to do to achieve corrigibility, because it adds to the list of self-modifications we might want the AI to make that a non-corrigible AI would resist. Unclear to me how much more difficult it makes corrigibility.
If by “corrigible” we mean “the AI will cooperate with all self-modifications we want it to”, then no to 1 and yes to 2. But if you have an AI built by someone who assures you it’s corrigible, but who only had corrigibility w.r.t values/axiology in mind, then you might get yes to 1 and/or no to 2.
Yup, I see this as placing an additional constraint on what we need to do to achieve corrigibility, because it adds to the list of self-modifications we might want the AI to make that a non-corrigible AI would resist. Unclear to me how much more difficult it makes corrigibility.