If you had this nation of geniuses in a datacenter it would very obviously then make rapid further AI progress and go into full recursive self-improvement mode.
When it becomes robustly smarter than humans, it’ll recognize that building AIs much smarter than it is dangerous. So if it doesn’t immediately solve the alignment problem, in an ambitious way that doesn’t leave it permanently disempowered afterwards, then it’s going to ban/pause full recursive self-improvement until later. It’ll still whoosh right past humanity in all the strategically relevant capabilities that don’t carry such risks to it, but that’s distinct from immediate full recursive self-improvement.
I’d speculate that you have a large advantage with practical partial solutions to alignment by being in silico. Some of the standard AI advantages for capability improvements maybe also be significant advantages for alignment (auto-alignment?). For example:
It’s relatively easier to have self-insight, because at least you have “physical” access.
It’s feasible to do A/B testing (in parallel with communication or merging; and/or with rollbacks), and other more complicated scaffolding.
It’s easy to hit the narrow target of having a mind that is near your level of intelligence but just slightly smarter. There are alignment ideas that assume you have access to such a mind, and that assumption is not that plausible—unless you can copy-and-slightly-improve yourself.
You said “immediately solve the alignment problem, in an ambitious way...”, but you could have a smoother takeoff-paced series of alignment solutions. Maybe.
I think this is unreasonably hopeful. I think it’s likely that AI companies will develop a superhuman researcher mostly out of RLing for doing research, which I would expect to shape an AI whose main drive is towards doing research. To the extent that it may have longer-horizon drives beyond individual research, I expect those to be built around, and secondary to, a non-negotiable drive to do research now.
(At risk of over-anthropomorphizing AIs, analogize them to e/accs who basically just wanted to make money building cool AI stuff, and invented an insane philosophical edifice entirely subservient to that drive.)
I’m not being hopeful, I think this hypothetical involves a less merciful takeover than otherwise, because the AIs that take over are not superintelligent, and so unable to be as careful about the outcomes as they might want. In any case there’s probably at least permanent disempowerment, but a non-superintelligent takeover makes literal extinction (or a global catastrophe short of extinction) more likely (for AIs with the same values).
When it becomes robustly smarter than humans, it’ll recognize that building AIs much smarter than it is dangerous. So if it doesn’t immediately solve the alignment problem, in an ambitious way that doesn’t leave it permanently disempowered afterwards, then it’s going to ban/pause full recursive self-improvement until later. It’ll still whoosh right past humanity in all the strategically relevant capabilities that don’t carry such risks to it, but that’s distinct from immediate full recursive self-improvement.
I’d speculate that you have a large advantage with practical partial solutions to alignment by being in silico. Some of the standard AI advantages for capability improvements maybe also be significant advantages for alignment (auto-alignment?). For example:
It’s relatively easier to have self-insight, because at least you have “physical” access.
It’s feasible to do A/B testing (in parallel with communication or merging; and/or with rollbacks), and other more complicated scaffolding.
It’s easy to hit the narrow target of having a mind that is near your level of intelligence but just slightly smarter. There are alignment ideas that assume you have access to such a mind, and that assumption is not that plausible—unless you can copy-and-slightly-improve yourself.
You said “immediately solve the alignment problem, in an ambitious way...”, but you could have a smoother takeoff-paced series of alignment solutions. Maybe.
I think this is unreasonably hopeful. I think it’s likely that AI companies will develop a superhuman researcher mostly out of RLing for doing research, which I would expect to shape an AI whose main drive is towards doing research. To the extent that it may have longer-horizon drives beyond individual research, I expect those to be built around, and secondary to, a non-negotiable drive to do research now.
(At risk of over-anthropomorphizing AIs, analogize them to e/accs who basically just wanted to make money building cool AI stuff, and invented an insane philosophical edifice entirely subservient to that drive.)
I’m not being hopeful, I think this hypothetical involves a less merciful takeover than otherwise, because the AIs that take over are not superintelligent, and so unable to be as careful about the outcomes as they might want. In any case there’s probably at least permanent disempowerment, but a non-superintelligent takeover makes literal extinction (or a global catastrophe short of extinction) more likely (for AIs with the same values).