First of all, I think that we will see the intelligence explosion once the AIs become superhuman coders. In addition, I don’t think that I understand how Agent-x-n+1 will become more aligned than Agent-x-n if mankind doesn’t create a new training environment which actually ensures that the AI obeys the Spec. For example, sycophancy was solved by the KimiK2 team which dared to stop using RLHF, resorting to RLVR and self-critique instead.
However, there is a piece of hope. For example, one could deploy the AIs to cross-check each other’s AI research. Alas, this technique might as well run into problems due to the fact that the companies were merged beforehand as a result of Taiwan having been invaded or that the AIs managed to agree on a common future. I did try to explore this technique and its potential results back when I wrote my version of the AI-2027 scenario.
First of all, I think that we will see the intelligence explosion once the AIs become superhuman coders. In addition, I don’t think that I understand how Agent-x-n+1 will become more aligned than Agent-x-n if mankind doesn’t create a new training environment which actually ensures that the AI obeys the Spec. For example, sycophancy was solved by the KimiK2 team which dared to stop using RLHF, resorting to RLVR and self-critique instead.
However, there is a piece of hope. For example, one could deploy the AIs to cross-check each other’s AI research. Alas, this technique might as well run into problems due to the fact that the companies were merged beforehand as a result of Taiwan having been invaded or that the AIs managed to agree on a common future. I did try to explore this technique and its potential results back when I wrote my version of the AI-2027 scenario.