This sort of approach doesn’t make so much sense for research explicitly aiming at changing the dynamics in this critical period. Having an alternative, safer idea almost ready-to-go (with some explicit support from some fraction of the AI safety community) is a lot different from having some ideas which the AI could elaborate.
With AI assistance, the degree to which an alternative is ready-to-go can differ a lot compared to its prior human-developed state. Also, an idea that’s ready-to-go is not yet an edifice of theory and software that’s ready-to-go in replacing 5e28 FLOPs transformer models, so some level of AI assistance is still necessary with 2 year timelines. (I’m not necessarily arguing that 2 year timelines are correct, but it’s the kind of assumption that my argument should survive.)
The critical period includes the time when humans are still in effective control of the AIs, or when vaguely aligned and properly incentivised AIs are in control and are actually trying to help with alignment, even if their natural development and increasing power would end up pushing them out of that state soon thereafter. During this time, the state of current research culture shapes the path-dependent outcomes. Superintelligent AIs that are reflectively stable will no longer allow path dependence in their further development, but before that happens the dynamics can be changed to an arbitrary extent, especially with AI efforts as leverage in implementing the changes in practice.
This sort of approach doesn’t make so much sense for research explicitly aiming at changing the dynamics in this critical period. Having an alternative, safer idea almost ready-to-go (with some explicit support from some fraction of the AI safety community) is a lot different from having some ideas which the AI could elaborate.
With AI assistance, the degree to which an alternative is ready-to-go can differ a lot compared to its prior human-developed state. Also, an idea that’s ready-to-go is not yet an edifice of theory and software that’s ready-to-go in replacing 5e28 FLOPs transformer models, so some level of AI assistance is still necessary with 2 year timelines. (I’m not necessarily arguing that 2 year timelines are correct, but it’s the kind of assumption that my argument should survive.)
The critical period includes the time when humans are still in effective control of the AIs, or when vaguely aligned and properly incentivised AIs are in control and are actually trying to help with alignment, even if their natural development and increasing power would end up pushing them out of that state soon thereafter. During this time, the state of current research culture shapes the path-dependent outcomes. Superintelligent AIs that are reflectively stable will no longer allow path dependence in their further development, but before that happens the dynamics can be changed to an arbitrary extent, especially with AI efforts as leverage in implementing the changes in practice.