Only if we are better at moral reflection than Claude! It seems quite possible to me that Claude 7 can better CEV my values than I can, and that its ethical maturity is ‘better’ than mine in some sense.
But I agree that there is an important cost to making AIs less corrigible.
Why does this post say ’29 min read, 7,100 words’ - could it be something to do with the embedded interactive elements, maybe the code for those is automatically counted by the reading time estimator? Minor, but seems worth fixing if the bug occurs elsewhere too.