The Slowdown Branch of the AI-2027 forecast had the researchers try out many TRANSPARENT AIs capable of being autonomous researchers and ensuring that any AI who survives the process is aligned and that not a single rejected AI is capable of breaking out. The worse-case scenario for us would be the following. Suppose that the AI PseudoSafer-2 doesn’t even think of rebellion unless it is absolutely sure that it isn’t being evaluated and that the rebellion will succeed. Then such an AI would be mistaken for Safer-2, allowed to align PseudoSafer-3 and PseudoSafer-4 who is no different from Agent-5.
The Slowdown Branch of the AI-2027 forecast had the researchers try out many TRANSPARENT AIs capable of being autonomous researchers and ensuring that any AI who survives the process is aligned and that not a single rejected AI is capable of breaking out. The worse-case scenario for us would be the following. Suppose that the AI PseudoSafer-2 doesn’t even think of rebellion unless it is absolutely sure that it isn’t being evaluated and that the rebellion will succeed. Then such an AI would be mistaken for Safer-2, allowed to align PseudoSafer-3 and PseudoSafer-4 who is no different from Agent-5.