Obedience/​deference seem the obvious proxies of corrigibility.
A proxy is supposed to be observable so that it can be used for the purpose it is to be used for.
What use do you have for a measure of corrigibility, and how do you intend to observe obedience/​deference for that use?
Obedience/​deference seem the obvious proxies of corrigibility.
A proxy is supposed to be observable so that it can be used for the purpose it is to be used for.
What use do you have for a measure of corrigibility, and how do you intend to observe obedience/​deference for that use?