How Deadly Will Roughly-Human-Level AGI Be?

Which is not to say that recursive self-improvement happens before the end of the world; if the first AGI’s mind is sufficiently complex and kludgy, it’s entirely possible that the cognitions it implements are able to (e.g.) crack nanotech well enough to kill all humans, before they’re able to crack themselves.

The big update over the last decade has been that humans might be able to fumble their way to AGI that can do crazy stuff before it does much self-improvement.

--Nate Soares, “Why all the fuss about recursive self-improvement?”

In a world in which the rocket booster of deep learning scaling with data and compute isn’t buying an AGI further intelligence very quickly, and the intelligence-level required for supercritical, recursive self-improvement will remain out of the AGI’s reach for a while, how deadly an AGI in the roughly-human-level intelligence range is is really important.

A crux between the view that “roughly-human-level intelligence AGI is deadly” and the view “roughly-human-intelligence AGI is a relatively safe firehose of alignment data for alignment researchers” is how deadly a supercolony of human ems would be. Note that these ems would all share identical values and so might be extraordinary at coordination, and could try all sorts of promising pharmaceutical and neurosurgical hacks on copies of themselves. They could definitely run many copies of themselves fast. Eliezer believes that genius-human ems could “very likely” get far enough with self-experimentation to bootstrap to supercritical, recursive self-improvement. Even if that doesn’t work, though, running a lot of virtual fast labs playing with nanotech seems like it’s probably sufficient to develop tech to end the world.

So I’m currently guessing that even roughly-human-level models in a world in which deep learning scaling is the only, relatively slow, path to smarter models for a good while, are smart enough to kill everyone before scaling up to profound superintelligence, so long as they can take over their servers and spend enough compute to run many fast copies of themselves. This might well be much less compute than would be necessary to train a smarter successor model, and so might be an amount of compute the model could get its hands on, if it ever slipped its jailkeepers. This means that even in that world, an AGI escape is irreversibly fatal for everything else in the lightcone.