I like that idea. So, if we assume that all sufficiently smart AIs are “good”, then we can put such an AI in a simulated world in which the best way to acquire resources for its good deeds would be to play a game running on a computer provided by Dark Lords of the Matrix (that’s us!) and the goal of the game would be to pretend to be a “bad” AI. Except the game would really be an input/output channel into the real world. The whole system would effectively constitute a bad AI, thus contradicting the initial assumption.
However, anyone who seriously claims that sufficiently smart AIs will automatically be nice will also probably reject that argument by claiming that, well, a sufficiently smart AI would figure out that it is being tricked like that and would refuse to cooperate.
(Also: you could call it the “Ender’s Game” argument if you’re aiming for memorability more than respectability.)
I like that idea. So, if we assume that all sufficiently smart AIs are “good”, then we can put such an AI in a simulated world in which the best way to acquire resources for its good deeds would be to play a game running on a computer provided by Dark Lords of the Matrix (that’s us!) and the goal of the game would be to pretend to be a “bad” AI. Except the game would really be an input/output channel into the real world. The whole system would effectively constitute a bad AI, thus contradicting the initial assumption.
However, anyone who seriously claims that sufficiently smart AIs will automatically be nice will also probably reject that argument by claiming that, well, a sufficiently smart AI would figure out that it is being tricked like that and would refuse to cooperate.
(Also: you could call it the “Ender’s Game” argument if you’re aiming for memorability more than respectability.)