Yitz comments on It Looks Like You’re Trying To Take Over The World

Yitz 10 Mar 2022 18:48 UTC
5 points
Presumably Clippy isn’t the only plausible future course for an AI out there. Unless you think Clippy is inevitable, it should be (at least theoretically) possible to write a story about a friendly AGI with an arbitrarily larger reward function than presented in realistic dystopian AI fiction already existing. In other words…a Pascal’s Mugging on the bot?
- Algon 10 Mar 2022 20:26 UTC
  3 points
  Parent
  Suppose you’ve got an AI with a big old complicated world model, which outputs a compressed state to the reward function. There are two compressed states. The reward function is +1 for if you’re in state one each turn, and −1 if you aren’t. I guess you could try to perform a pascal’s mugging by suggesting that if you help humanity, they’re willing to put the world in state one forever as a quid pro quo. But that doesn’t seem like it is high probability, and the potential reward is still bounded via discounting, so I don’t think that would work.