I think informed oversight fits better with MtG white than it does with boxing. I agree that the three main examples are boxing like, and informed oversight is not, but it still feels white to me.
I do think that corrigibility done right is a thing that is in some sense less agentic. I think that things that have goals outside of them are less agentic than things that have their goals inside of them, but I think corrigibility is stronger than that. I want to say something like a corrigible agent not only has its goals partially on the outside (in the human), but also partially has its decision theory on the outside. Idk.
I think informed oversight fits better with MtG white than it does with boxing. I agree that the three main examples are boxing like, and informed oversight is not, but it still feels white to me.
I do think that corrigibility done right is a thing that is in some sense less agentic. I think that things that have goals outside of them are less agentic than things that have their goals inside of them, but I think corrigibility is stronger than that. I want to say something like a corrigible agent not only has its goals partially on the outside (in the human), but also partially has its decision theory on the outside. Idk.