Rob Bensinger comments on On A List of Lethalities

Rob Bensinger 13 Jun 2022 22:43 UTC
9 points
0
In slower take-off worlds, it seems that agents would develop in a world in which laws/culture/norms were enforced at each step of the intelligence development process. Thus at each stage of development, AI agents would be operating in a competitive/cooperative world, eventually leading to a world of competition between many superintelligent AI agents with established Schelling points of cooperation that human agents could still participate in.
Suppose that many different actors have AGI systems; the systems have terminal goals like ‘maximize paperclips’, and these goals imply ‘kill any optimizers that don’t share my goals, if you find a way to do so without facing sufficiently-bad consequences’ (because your EV is higher if there are fewer optimizers trying to push the universe in different directions than what you want).
The systems nonetheless behave in prosocial ways, because they’re weak and wouldn’t win a conflict against humans. Instead, the AGI systems participate in a thriving global economy that includes humans as well as all the competing AGIs; and all parties accept the human-imposed legal environment, since nobody can just overthrow the humans.
One day, one of the AGI systems improves to the point where it unlocks a new technology that can reliably kill all humans, as well as destroying all of its AGI rivals. (E.g., molecular nanotechnology.) I predict that regardless of how well-behaved it’s been up to that point, it uses the technology and takes over. Do you predict otherwise?
Alternative scenario: One day, one of the AGI systems unlocks a new technology that can reliably kill all humans, but it isn’t strong enough to destroy rival AGI systems. In that case, by default I predict that it kills all humans and then carries on collaborating or competing with the other AGI systems in the new humanless equilibrium.
Alternative scenario 2: The new technology can kill all AGI systems as well as all humans, but the AGI made a binding precommitment to not use such technologies (if it finds them) against any agents that (a) are smart enough to inspect its source code and confidently confirm that it has made this precommitment, and (b) have verifiably made the same binding precommitment. Some or all of the other AGI systems may meet this condition, but humans don’t, so you get the “AGI systems coordinate, humans are left out” equilibrium Eliezer described.
This seems like a likely outcome of multipolar AGI worlds to me, and I don’t see how it matters whether there was a prior “Schelling point” or human legal code. AGIs can just agree to new rules/norms.
Alternative scenario 3: The AGI systems don’t even need a crazy new technology, because their collective power ends up being greater than humanity’s, and they agree to a “coordinate with similarly smart agents against weaker agents” pact. Again, I don’t see how it changes anything if they first spend eight years embedded in a human economy and human legal system, before achieving enough collective power or coordination ability to execute this. If a human-like legal system is useful, you can just negotiate a new one that goes into effect once the humans are dead.
- Andy_McKenzie 14 Jun 2022 2:26 UTC
  2 points
  0
  Parent
  “One day, one of the AGI systems improves to the point where it unlocks a new technology that can reliably kill all humans, as well as destroying all of its AGI rivals. (E.g., molecular nanotechnology.) I predict that regardless of how well-behaved it’s been up to that point, it uses the technology and takes over. Do you predict otherwise?”
  
  I agree with this, given your assumptions. But this seems like a fast take off scenario, right? My main question wasn’t addressed — are we assuming a fast take off? I didn’t see that explicitly discussed.
  
  My understanding is that common law isn’t easy to change, even if individual agents would prefer to. This is why there are Nash equilibria. Of course, if there’s a fast enough take off, then this is irrelevant.
  - Rob Bensinger 14 Jun 2022 4:04 UTC
    2 points
    0
    Parent
    I would define hard takeoff as “progress in cognitive ability from pretty-low-impact AI to astronomically high-impact AI is discontinuous, and fast in absolute terms”.
    Unlocking a technology that lets you kill other powerful optimizers (e.g., nanotech) doesn’t necessarily require fast or discontinuous improvements to systems’ cognition. E.g., humans invented nuclear weapons just via accumulating knowledge over time; the invention wasn’t caused by us surgically editing the human brain a few years prior to improve its reasoning. (Though software improvements like ‘use scientific reasoning’, centuries prior, were obviously necessary.)