Skimming some of the posts in the sequence, I am not persuaded that corrigibility now looks like an engineering problem rather than a problem that needs (a) major theoretical breakthrough(s).
The point about corrigibility MIRI keeps making is that it’s anti-natural, and Max seems to agree with that.
(Seems like this is a case where we should just tag @Max Harms and see what he thinks in this context)
Skimming some of the posts in the sequence, I am not persuaded that corrigibility now looks like an engineering problem rather than a problem that needs (a) major theoretical breakthrough(s).
The point about corrigibility MIRI keeps making is that it’s anti-natural, and Max seems to agree with that.
(Seems like this is a case where we should just tag @Max Harms and see what he thinks in this context)