Loose thoughts on AGI risk
My thoughts here are speculative in nature, and should not be taken as strongly-held beliefs. As part of an experiment in memetic minimization, only the tldr version of this post exists. There are no supporting links. Let me know what you think in the comments.
Summary: the ultimate threat of AGI is in what it practically does. The danger of (near-term, not-incomprehensibly-superhuman) AGI compared to other actors is motivation. Humans are bad at terrorism, and it’s not because mass murder is objectively hard. A human-level agent without human biases can reasonably be expected to do tremendous damage if they so wish. It seems to be a debatable matter if it is possible with current or near-term technological capabilities to completely wipe out humanity within a short time frame, or if a partially boxed destructive agent is inherently limited in its power. One limiting factor in current discussions is discussion of plausible methods to destroy the world. Giving too much detail is infohazardous, but giving too little or too fanciful details makes the pessimistic position seem fantastical to those not aware of plausible methods. (I personally have a human-extinction method I’m fairly confident in, but am not sure it’s worth the risk to share online. I also do not believe that methods presented in this forum so far are convincing to outsiders, even if correct.)
A possible argument against near-term AGI doom is the limitations of a world without human agents, and how fast/easy it would be to create safely robust agents with equivalent power and dexterity as humans. If AGI physically can’t succeed at its task without us in the short term, it will need to keep us alive until we are fully redundant. This could buy us time,
and potentially even allow for blackmail. If we can find a way for an agent to plausibly be forced to abide by a contract [this isn’t the right wording exactly, more like the agent keeping a promise due to blackmail now, even if you don’t hold collateral later], then preventing the extinction of humanity (if not fully solving alignment) might be feasible, with a moderate degree of confidence.
All of this is dependent on an AGI’s plans being dependent on our continued existence, which is minimally equivalent to the shortest possible amount of time it would take to create real-world agents that are, at least, capable of reliably keeping current internet infrastructure running. Destroying humanity before that point would be counterproductive to almost all instrumental goals.