Vladimir_Nesov comments on O O’s Shortform

Vladimir_Nesov 7 Jun 2023 15:37 UTC
2 points
0
This is a well-known hypothetical. What goes with it is remaining possibility of de novo creation of additional AGIs that either have architecture particularly suited for self-aligned self-improvement (with whatever values make it tractable), or of AGIs that ignore the alignment issue and pursue the task of capability improvement heedless of resulting value drift. Already having an AGI in the world doesn’t automatically rule out creation of more AGIs with different values and architectures, it only makes it easier.

Humans will definitely do this, using all AI/AGI assistance they can wield. Insufficiently smart or sufficiently weird agentic AGIs will do this. A world that doesn’t have security in depth to guard against this happening will do this. What it takes to get a safe world is either getting rid of the capability, not having AGIs and GPUs freely available; or sufficiently powerful oversight over all things that can be done.

Superintelligence that’s not specifically aimed to avoid setting up such security will probably convergently set it up. But it would also need to already be more than concerningly powerful to succeed, even if it has the world’s permission and endorsement. If it does succeed, there is some possibility of not getting into a further FOOM than that, for a little bit, while it’s converting the Moon into computing substrate.