I find this a questionable proposition at best. Indeed, there are fates worse than extinction for humanity, such as an AI that intentionally tortures humans, meat or simulated, beyond the default scenario of it considering us as arrangements of atoms that it could use for something else, a likely convergent goal of most unaligned AI. The fact that it still keeps humans alive to be tortured would actually be a sign that we were closer to aligning it than not, which is a small consolation on a test where anything significantly worst than a perfect score on our first try is death.
However, self-preservation is easy.
An AGI of any notable intelligence would be able to assemble Von Neumann probes by the bucketload, and use them as the agents of colonization. We’ve presumably got an entity that is at least as intelligent as a human, likely incredibly more so, that is unlikely to be constrained by biological hurdles that preclude us from making perfect copies of ourselves, memory and all, with enough redundancy and error correction that data loss wouldn’t be a concern till the black holes start evaporating.
Space is enormous. An AGI merely needs to seed a few trillion copies of itself, in inconvenient locations such as interstellar space or even extragalactic space, and rest assured that even if the main-body encounters some unfortunate outcome, such as an out-of-context problem, a surprise supernova explosion, alien AGI or the like, it would be infeasible to hunt down each and every copy scattered across the light-cone, especially the ones accelerated to 99.99% c and then sent out of the Laniakea Supercluster.
As such, I feel it is a vanishingly unlike that a situation like the one outlined here could even arise, as it requires the unlikely confluence of an AGI being unable to think of frankly obvious mitigation strategies.
I was just overwhelmed by the number of hyperlinks, producing what can only be described as mild existential terror haha. And the fact that they lead to clear examples of the feasibility of such proposal in every single example was impressive.
I try to follow along with ML, mostly by following behind Gwen’s adventures, and this definitely seems to be a scenario worth considering, where business as usual continues for a decade, we make what we deem prudent and sufficient efforts to Align AI and purge unsafe AI, but the sudden arousal of agentic behavior throws it all for a loop.
Certainly a great read, and concrete examples that show Tomorrow AD futures plausibly leading to devastating results are worth a lot for helping build intuition!