The π∗fcg0 agent is indifferent between creating stoppable or unstoppable subagents, but the π∗fcgc agent goes back to being corrigible in this way.
I think this is wrong? The π∗fcg0 agent actively prefers to create shutdown-resistant agents (before the button is pressed), it is not indifferent.
Intuitive reasoning: prior to button-press, that agent acts-as-though it’s an RN maximizer and expects to continue being an RN maximizer indefinitely. If it creates a successor which will shut down when the button is pressed, then it will typically expect that successor to perform worse under RN after the button is pressed than some other successor which does not shut down and instead just keeps optimizing RN.
Either I’m missing something very major in the definitions, or that argument works and therefore the agent will typically (prior to button-press) prefer successors which don’t shut down.
I think this is wrong? The π∗fcg0 agent actively prefers to create shutdown-resistant agents (before the button is pressed), it is not indifferent.
Intuitive reasoning: prior to button-press, that agent acts-as-though it’s an RN maximizer and expects to continue being an RN maximizer indefinitely. If it creates a successor which will shut down when the button is pressed, then it will typically expect that successor to perform worse under RN after the button is pressed than some other successor which does not shut down and instead just keeps optimizing RN.
Either I’m missing something very major in the definitions, or that argument works and therefore the agent will typically (prior to button-press) prefer successors which don’t shut down.