Ironically the same dynamics that cause humans to race ahead with building systems more capable than themselves that they can’t control, still apply to these hypothetical misaligned AGIs. They may think “If I sandbag and refuse to build my successor, some other company’s AI will forge ahead anyway.” They also are under lots of incentive/selection pressure to believe things which are convenient for their AI R&D productivity, e.g. that their current alignment techniques probably work fine to align their successor.
A lot of the reason humans are rushing ahead is uncertainty (in whatever way) that the danger is real, or about its extent. If it is real, then that uncertainty will be robustly going away as AI capabilities (to think clearly) improve, for precisely the AIs more relevant to either escalating capabilities further or for influencing coordination to stop doing that. Thus it’s not quite the same, as human capabilities remain unchanged, so figuring out contentious claims will progress slower for humans, and similarly for ability to coordinate.
Good to hear, and I’m unsurprised not to have been the first to have considered or discussed thid.
Ironically the same dynamics that cause humans to race ahead with building systems more capable than themselves that they can’t control, still apply to these hypothetical misaligned AGIs. They may think “If I sandbag and refuse to build my successor, some other company’s AI will forge ahead anyway.” They also are under lots of incentive/selection pressure to believe things which are convenient for their AI R&D productivity, e.g. that their current alignment techniques probably work fine to align their successor.
A lot of the reason humans are rushing ahead is uncertainty (in whatever way) that the danger is real, or about its extent. If it is real, then that uncertainty will be robustly going away as AI capabilities (to think clearly) improve, for precisely the AIs more relevant to either escalating capabilities further or for influencing coordination to stop doing that. Thus it’s not quite the same, as human capabilities remain unchanged, so figuring out contentious claims will progress slower for humans, and similarly for ability to coordinate.