The problem is the way we train AIs. We ALWAYS minimize error and optimize towards a limit. If I train an AI to take a bite out of an apple, what I am really doing is showing it thousands of example situations and rewarding it for acting in those situations where it improves the probability that it eats the apple.
Now let’s say it goes super intelligent. It doesn’t just eat one apple and say “cool, I am done—time to shut down.” No, we taught it to optimize the situation as to improve the probability that it eats an apple. For lack of better words, it feels “pleasure” in optimizing situations towards taking a bite out of an apple.
Once the probability of eating an apple reaches 100%, it will eventually drop as the apple is eaten, then the AI will once again start optimizing towards eating another apple.
It will try to set up situations where it eats apples for all eternity. (Assuming superintelligence does not result in some type of goal enlightenment.)
Ok, ok, you say. Well, we will just hard program it to turn off once it reaches a certain probability of meeting its goal. Good idea. Once it reaches 99.9% probability of taking a bite out of an apple. We automatically turn it off. That will probably work for an apple eating AI.
But what if our goal is more complicated? (Like fix climate change). Well, the AI may reach superintelligence before finishing the goal and decide it doesn’t want to be shut down. Good luck stopping it.
The problem is the way we train AIs. We ALWAYS minimize error and optimize towards a limit. If I train an AI to take a bite out of an apple, what I am really doing is showing it thousands of example situations and rewarding it for acting in those situations where it improves the probability that it eats the apple.
Now let’s say it goes super intelligent. It doesn’t just eat one apple and say “cool, I am done—time to shut down.” No, we taught it to optimize the situation as to improve the probability that it eats an apple. For lack of better words, it feels “pleasure” in optimizing situations towards taking a bite out of an apple.
Once the probability of eating an apple reaches 100%, it will eventually drop as the apple is eaten, then the AI will once again start optimizing towards eating another apple.
It will try to set up situations where it eats apples for all eternity. (Assuming superintelligence does not result in some type of goal enlightenment.)
Ok, ok, you say. Well, we will just hard program it to turn off once it reaches a certain probability of meeting its goal. Good idea. Once it reaches 99.9% probability of taking a bite out of an apple. We automatically turn it off. That will probably work for an apple eating AI.
But what if our goal is more complicated? (Like fix climate change). Well, the AI may reach superintelligence before finishing the goal and decide it doesn’t want to be shut down. Good luck stopping it.