It seems to me that Rice’s theorem implies that it is impossible for there to be an “isAligned” function to verify an AI’s alignment, independent of how you define alignment.
Rice’s theorem says that you can’t tell if a program is adding together two natural numbers, prints the answer, and terminates. Yet for many programs, you can prove that it’s what they do, or can make is so by construction, choosing a program with that property of behavior. It’s never relevant to anything in practice.
It seems to me that Rice’s theorem implies that it is impossible for there to be an “isAligned” function to verify an AI’s alignment, independent of how you define alignment.
Rice’s theorem says that you can’t tell if a program is adding together two natural numbers, prints the answer, and terminates. Yet for many programs, you can prove that it’s what they do, or can make is so by construction, choosing a program with that property of behavior. It’s never relevant to anything in practice.