Rob Bensinger comments on Late 2021 MIRI Conversations: AMA / Discussion

Rob Bensinger 4 Mar 2022 20:50 UTC
LW: 16 AF: 8
0
AF
[W]iping out humanity is the most expensive of these options and the AGI would likely get itself destroyed while trying to do that[.]
It would be pretty easy and cheap for something much smarter than a human to kill all humans. The classic scenario is:
A. [...] The notion of a ‘superintelligence’ is not that it sits around in Goldman Sachs’s basement trading stocks for its corporate masters. The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of proteins which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then… well, it doesn’t really matter from our perspective what comes after that, because from a human perspective any technology more advanced than molecular nanotech is just overkill. A superintelligence with molecular nanotech does not wait for you to buy things from it in order for it to acquire money. It just moves atoms around into whatever molecular structures or large-scale structures it wants.
Q. How would it get the energy to move those atoms, if not by buying electricity from existing power plants? Solar power?
A. Indeed, one popular speculation is that optimal use of a star system’s resources is to disassemble local gas giants (Jupiter in our case) for the raw materials to build a Dyson Sphere, an enclosure that captures all of a star’s energy output. This does not involve buying solar panels from human manufacturers, rather it involves self-replicating machinery which builds copies of itself on a rapid exponential curve -
If the smarter-than-human system doesn’t initially have Internet access, it will probably be able to get such access either by manipulating humans, or by exploiting the physical world in unanticipated ways (cf. Bird and Layzell 2002).
But also, if enough people have AGI systems it’s not as though no one will ever hook it up to the Internet, any more than you could give a nuke to every human on Earth and expect no one to ever use theirs.
Eliezer gives one example of a way to kill humanity with nanotech in his conversation with Jaan Tallinn:
[...] Killing all humans is the obvious, probably resource-minimal measure to prevent those humans from building another AGI inside the solar system, which could be genuinely problematic. The cost of a few micrograms of botulinum per human is really not that high and you get to reuse the diamondoid bacteria afterwards.
[… I]n my lower-bound concretely-visualized strategy for how I would do it, the AI either proliferates or activates already-proliferated tiny diamondoid bacteria and everybody immediately falls over dead during the same 1-second period, which minimizes the tiny probability of any unforeseen disruptions that could be caused by a human responding to a visible attack via some avenue that had not left any shadow on the Internet, previously scanned parts of the physical world, or other things the AI could look at. [...]
Are you assuming that early AGI systems won’t be much smarter than a human?
Most likely, a typical AGI will have some mundane, neutral-to-benevolent goal like “maximize profit by running this steel factory and selling steel”.
I don’t think that goal is “neutral-to-benevolent”, but I also don’t think any early AGI systems will have goals remotely like that. Two reasons for that:
- We have no idea how to align AI so that it reliably pursues any intended goal in the physical world; and we aren’t on track to figuring that out before AGI is here. “Maximize profit by running this steel factory and selling steel” might be a goal the human operators have for the system; but the actual goal the system ends up optimizing will be something very different, “whatever goal (and overall model) happened to perform well in training, after a blind gradient-descent-ish search for goals (and overall models)”.
  - If you can reliably instill an ultimate goal like “maximize profit by running this steel factory and selling steel” into an AGI system, the you’ve already mostly solved the alignment problem and eliminated most of the risk.
- A more minor objection to this visualization: By default, I expect AGI to vastly exceed human intelligence and destroy the world long before it’s being deployed in commercial applications. Instead, I’d expect early-stage research AI to destroy the world.

Rob Bensinger comments on Late 2021 MIRI Conversations: AMA /​ Discussion

Rob Bensinger comments on Late 2021 MIRI Conversations: AMA / Discussion