How about when you want to use AI to cooperate, you keep the AI corrigible but require all human parties to the agreement to consent to any override? The important thing with corrigibility is the ability to correct catastrophic errors in the AI’s behavior, right?
How about when you want to use AI to cooperate, you keep the AI corrigible but require all human parties to the agreement to consent to any override? The important thing with corrigibility is the ability to correct catastrophic errors in the AI’s behavior, right?