You can’t really tell whether something that is smarter than yourself is behaving correctly. In the end a non-self-modifying AI checking on whether a self-modifying sub-AI is behaving correctly isn’t much different from a safety perspective than a human checking whether the self modifying AI is behaving correctly.
You can’t really tell whether something that is smarter than yourself is behaving correctly. In the end a non-self-modifying AI checking on whether a self-modifying sub-AI is behaving correctly isn’t much different from a safety perspective than a human checking whether the self modifying AI is behaving correctly.