I don’t really believe in corrigibility as a thing that could hold up to much of any optimization pressure. It’s not impossible to make a corrigible ASI, but my guess is to build a corrigible ASI you first need an aligned ASI to build it for you, and so as a target it’s pretty useless.
My guess is that puts me in enough disagreement to qualify for your question?
I don’t really believe in corrigibility as a thing that could hold up to much of any optimization pressure. It’s not impossible to make a corrigible ASI, but my guess is to build a corrigible ASI you first need an aligned ASI to build it for you, and so as a target it’s pretty useless.
My guess is that puts me in enough disagreement to qualify for your question?
Neat! Yeah, I think it does!