What are some reasons to believe that Rice’s theorem doesn’t doom the AI alignment project by virtue of making it impossible to verify alignment, independent of how it is defined/formalized?
This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that’s not formally verifiable, but it doesn’t seem like that’s going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been.
What are some reasons to believe that Rice’s theorem doesn’t doom the AI alignment project by virtue of making it impossible to verify alignment, independent of how it is defined/formalized?
This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that’s not formally verifiable, but it doesn’t seem like that’s going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been.