I think that you hit on two of the most challenging parts of corrigibility: manipulation and dependency. It’s hard to clearly define these or make coherent rules about them. In particular, I think figuring out how to decide how much ‘influence’ is too much like ‘manipulation’ is an important goal to a workable theory of corrigibility.
I think that you hit on two of the most challenging parts of corrigibility: manipulation and dependency. It’s hard to clearly define these or make coherent rules about them. In particular, I think figuring out how to decide how much ‘influence’ is too much like ‘manipulation’ is an important goal to a workable theory of corrigibility.