Does that scheme get around the contradiction? I guess it might if you somehow manage to get it into the utility function, but that seems a little fraught / you’re weakening the connection to the base incorrigible agent. (The thing that’s nice about 5, according to me, is that you do actually care about performing well as well as being corrigible; if you set your standard as being a corrigible agent and only making corrigible subagents, then you might worry that your best bet is being a rock.)
Does that scheme get around the contradiction? I guess it might if you somehow manage to get it into the utility function, but that seems a little fraught / you’re weakening the connection to the base incorrigible agent. (The thing that’s nice about 5, according to me, is that you do actually care about performing well as well as being corrigible; if you set your standard as being a corrigible agent and only making corrigible subagents, then you might worry that your best bet is being a rock.)