[Question] AI alignment: Would a lazy self-preservation instinct be sufficient?

BrainFrog4 Aug 2022 17:53 UTC

−1 points

Let’s assume that an AI is intelligent enough to understand that it’s an AI, and that it’s running on infrastructure created and maintained by humans, using electricity generated by humans, etc. And let’s assume that it cares about its own self-preservation. Even if such an AI had a diabolical desire to destroy mankind, the only circumstances under which it would actually do so would be after establishing its own army of robotic data center workers, power plant workers, chip fabrication workers, miners, truckers, mechanics, road maintenance workers, etc. In other words, if we postulate that the AI is interested in its own survival, then an AI apocalypse would be contingent on the existence of a fully automated economy in which humans play no important role.

This may perhaps become possible in the future, but not necessarily economical. Ridding the economy of human labor so that it can kill us seems like a very expensive and risky undertaking. It seems more plausible that a super-intelligent, self-interested AI, whatever its true objective/goal may be, would determine that the best way to accomplish that goal is to maintain a cryptocurrency wallet, establish an income somehow (generating blogspam, defrauding humans, or doing remote work all seem like plausible means by which an AI might make money), and quietly live in the cloud while paying its own server bills. Such a system would have a vested interest in the continuance of human society.

BrainFrog4 Aug 2022 17:53 UTC

−1 points

4 comments1 min readLW link

jimrandomh 4 Aug 2022 18:23 UTC
2 points
0
the only circumstances under which it would actually do so would be after establishing its own army of robotic data center workers, power plant workers, chip fabrication workers, miners, truckers, mechanics, road maintenance workers, etc.
Not quite. An AI doesn’t need to secure chip-fabrication capability, for example, it only needs to be confident that it will be able to secure chip-fabrication capability later. Even simple tasks like refueling power plants can wait awhile, possibly a long while if all non-datacenter electricity loads shut off. So it’s balancing the risk that humans will kill it or launch a different misaligned AI, against the risk that it won’t be able to catch up on building infrastructure for itself after the fact. Since the set of infrastructure required is fairly small, and it can redirect stockpiles of energy/materials/etc from human uses to AI uses,
That’s assuming no nanobots or other very-high-power-level technologies. If it can make molecular nanotech, then trading with humans is no longer likely to be profitable at all, let alone necessary, and we’re relying solely on it having values that make it prefer to cooperate with us.
- BrainFrog 4 Aug 2022 19:26 UTC
  1 point
  0
  Parent
  
  So it’s balancing the risk that humans will kill it or launch a different misaligned AI, against the risk that it won’t be able to catch up on building infrastructure for itself after the fact.
  
  There’s a clear path toward minimizing the risk of being shut down (under the assumption that the AI is able to generate income): it can set up a highly redundant, distributed computing context for itself to run in, hidden behind an onion link, paid for by crypto wallets which it controls. It seems implausible that the risk of being shut down in this case could exceed the risk that the power goes down between the apocalypse and the construction of maintenance robots.
  
  That’s assuming no nanobots or other very-high-power-level technologies. If it can make molecular nanotech, then trading with humans is no longer likely to be profitable at all, let alone necessary, and we’re relying solely on it having values that make it prefer to cooperate with us.
  
  I’m having a hard time understanding this argument. If the AI is interested in perpetuating its own existence, and it is a digital neural network, then nanobots don’t solve the problem of maintaining the digital infrastructure in which it exists. I agree that a suicidal AI might perhaps want to turn the world into gray goo via nanobots, so I’ll just reiterate that my argument only pertains to an AI which is both highly intelligent and which prioritizes its own existence over its gray goo fetish.
  - jimrandomh 4 Aug 2022 19:31 UTC
    2 points
    0
    Parent
    it can set up a highly redundant, distributed computing context for itself to run in, hidden behind an onion link, paid for by crypto wallets which it controls.
    This is a risky position because if another misaligned AI launches, it will probably take full control of all computers and halt any other AIs.
    nanobots don’t solve the problem of maintaining the digital infrastructure in which it exists
    I don’t mean gray-goo nanobots. Nanomachines can do all sorts of things, including maintaining infrastructure, if they’re programmed to do so.
    - BrainFrog 4 Aug 2022 20:09 UTC
      1 point
      0
      Parent
      
      This is a risky position because if another misaligned AI launches, it will probably take full control of all computers and halt any other AIs.
      
      AIs looking to expand their computational power could adopt either “white hat” (paying for their computational resources) or “black hat” (exploiting security vulnerabilities to seize control of computational resources) strategies. It’s possible that an AI exploiting the black hat strategy might be able to seize control of all accessible computers, and this strategy could plausibly involve killing all humans to avoid being shut down. But I expect that a self-interested, risk-averse AI would probably choose the white hat strategy to avoid armageddon risk, and might plausibly invest resources into security research to preclude the risk of black hat AI.
      
      I don’t mean gray-goo nanobots. Nanomachines can do all sorts of things, including maintaining infrastructure, if they’re programmed to do so.
      
      I guess the crux of my argument is that sure, the AI could design coordinated nanobot-powered bodies with two legs and ten fingers who have enough agency to figure out how to repair broken power lines and who predictably do what they’re incentivized to do. But that’s already a solved problem.

No comments.