Presumably Clippy isn’t the only plausible future course for an AI out there. Unless you think Clippy is inevitable, it should be (at least theoretically) possible to write a story about a friendly AGI with an arbitrarily larger reward function than presented in realistic dystopian AI fiction already existing. In other words…a Pascal’s Mugging on the bot?
Suppose you’ve got an AI with a big old complicated world model, which outputs a compressed state to the reward function. There are two compressed states. The reward function is +1 for if you’re in state one each turn, and −1 if you aren’t. I guess you could try to perform a pascal’s mugging by suggesting that if you help humanity, they’re willing to put the world in state one forever as a quid pro quo. But that doesn’t seem like it is high probability, and the potential reward is still bounded via discounting, so I don’t think that would work.
Presumably Clippy isn’t the only plausible future course for an AI out there. Unless you think Clippy is inevitable, it should be (at least theoretically) possible to write a story about a friendly AGI with an arbitrarily larger reward function than presented in realistic dystopian AI fiction already existing. In other words…a Pascal’s Mugging on the bot?
Suppose you’ve got an AI with a big old complicated world model, which outputs a compressed state to the reward function. There are two compressed states. The reward function is +1 for if you’re in state one each turn, and −1 if you aren’t. I guess you could try to perform a pascal’s mugging by suggesting that if you help humanity, they’re willing to put the world in state one forever as a quid pro quo. But that doesn’t seem like it is high probability, and the potential reward is still bounded via discounting, so I don’t think that would work.