Yitz comments on It Looks Like You’re Trying To Take Over The World

Yitz 10 Mar 2022 17:18 UTC
19 points
The way this story is written would suggest that the solution to this particular future would simply be to spam the internet with plausible stories about a friendly AI takeoff which an AGI will identify with and be like “oh hey cool that’s me”
- cwillu 10 Mar 2022 18:15 UTC
  26 points
  Parent
  What’s missing is the part where that recognition results in a prediction of an increase of the reward function. HQU turns into Clippy because the plausible stories about Clippy’s takeover sound pretty good from a reward function perspective, which is the only perspective that matters to HQU. Friendly reward functions on the other hand are these weird complicated things that don’t seem to resemble HQU’s reward function, and so don’t provide much inspiration for strategies to maximize it.
  - Yitz 10 Mar 2022 18:48 UTC
    5 points
    Parent
    Presumably Clippy isn’t the only plausible future course for an AI out there. Unless you think Clippy is inevitable, it should be (at least theoretically) possible to write a story about a friendly AGI with an arbitrarily larger reward function than presented in realistic dystopian AI fiction already existing. In other words…a Pascal’s Mugging on the bot?
    - Algon 10 Mar 2022 20:26 UTC
      3 points
      Parent
      Suppose you’ve got an AI with a big old complicated world model, which outputs a compressed state to the reward function. There are two compressed states. The reward function is +1 for if you’re in state one each turn, and −1 if you aren’t. I guess you could try to perform a pascal’s mugging by suggesting that if you help humanity, they’re willing to put the world in state one forever as a quid pro quo. But that doesn’t seem like it is high probability, and the potential reward is still bounded via discounting, so I don’t think that would work.
- Davidmanheim 13 Mar 2022 7:24 UTC
  3 points
  Parent
  Reasoning from fictional evidence, I see.
  
  The point wasn’t that this failure mode was likely, it was that approximately every objection we’ve seen as to why AI won’t become unsafe fails.
  - Yitz 13 Mar 2022 14:06 UTC
    3 points
    Parent
    I wouldn’t assume this particular failure mode is how things will go down in real life, just a potential counter-measure assuming the premises of the fiction