To the extent AIs are not already robustly aligned superintelligences, the priorities they might put on AI risk related projects of their own initiative might be suboptimal for our purposes. If humans already have their R&D priorities straight (based on previous humans-substantially-in-the-loop research), they might be able to keep the AIs working on the right things, even if the AIs don’t have sufficient propensity to go there spontaneously.
To the extent AIs are not already robustly aligned superintelligences, the priorities they might put on AI risk related projects of their own initiative might be suboptimal for our purposes. If humans already have their R&D priorities straight (based on previous humans-substantially-in-the-loop research), they might be able to keep the AIs working on the right things, even if the AIs don’t have sufficient propensity to go there spontaneously.