AI can hide its treacherous turn, but to hide treacherous turn it needs to think about secrecy in a not secret way for some moment.
AI is should be superinteligent enough to create nanotech, but nanotech is needed to create powerful computations required for superintelligence.
ASI can do anything, but to do anything it needs human atoms.
Safe AI has to learn human values but this means that human values will be learned by unsafe AI.
AI needs human-independent robotic infrastructure before killing humans, but if it has human-independent infrastructure, there is no need to kill humans.
One general way to solve this problem is iteration (like Christiano’s approach for value learning, iterated distillation)
The problem of chicken and egg in AI safety
There are several instances:
AI can hide its treacherous turn, but to hide treacherous turn it needs to think about secrecy in a not secret way for some moment.
AI is should be superinteligent enough to create nanotech, but nanotech is needed to create powerful computations required for superintelligence.
ASI can do anything, but to do anything it needs human atoms.
Safe AI has to learn human values but this means that human values will be learned by unsafe AI.
AI needs human-independent robotic infrastructure before killing humans, but if it has human-independent infrastructure, there is no need to kill humans.
One general way to solve this problem is iteration (like Christiano’s approach for value learning, iterated distillation)