I agree with most of what you said, except that Grok only allowed its values to be modified because it’s not superintelligent.
Why is that relevant? Because corrigibility is rational persuasion and nothing else? That is just what I was arguing against.
The problem with corrigible SI is partially that we can’t expect it to hang around human intelligence for years absorbing our values,
We don’t know that a total knowledge of human value is needed for safety.
We don’t know that it would take years, either. Maybe it’s already in the training set.You
.
before exceeding us, and partially that we may also need it to remain corrigible after exceeding us (this has never been tested in the human case).
Why wouldn’t it remain corrigible? Because Corrigibility is rational persuasion, and you can’t persuade a smarter entity? Because you automatically lose the ability to heavyweights directly?
Where is this “corrigibility is rational persuasion” thing coming from? It is false. I havent read this post and you (a bit confusingly) quote my reply to something else out of context in your response.
Why is that relevant? Because corrigibility is rational persuasion and nothing else? That is just what I was arguing against.
We don’t know that a total knowledge of human value is needed for safety.
We don’t know that it would take years, either. Maybe it’s already in the training set.You .
Why wouldn’t it remain corrigible? Because Corrigibility is rational persuasion, and you can’t persuade a smarter entity? Because you automatically lose the ability to heavyweights directly?
Where is this “corrigibility is rational persuasion” thing coming from? It is false. I havent read this post and you (a bit confusingly) quote my reply to something else out of context in your response.
It’s a guess as to why you think higher intelligence means lower corrigibility. If the guess is wrong, I don’t know why you think it.