Although I didn’t make this explicit, one problem is that manipulation is still weakly optimal—as you say. That wouldn’t fit the spirit of strict corrigibility, as defined in the post.
Note that AUP doesn’t have this problem.
Although I didn’t make this explicit, one problem is that manipulation is still weakly optimal—as you say. That wouldn’t fit the spirit of strict corrigibility, as defined in the post.
Note that AUP doesn’t have this problem.