I just wrote a post replying to part of Bostrom’s talk, but apparently I need 20 Karma points to post it, so… let it be a long comment instead:
Bostrom should modify his standard reply to the common “We’d just shut off / contain the AI” claim
In Superintelligence author Prof. Nick Bostrom’s most recent TED Talk, What happens when our computers get smarter than we are?, he spends over two minutes replying to the common claim that we could just shut off an AI or preemptively contain it in a box in order to prevent it from doing bad things that we don’t like, so there’s no need to be too concerned about the possible future development of AI that has misconceived or poorly specified goals:
Now you might say, if a computer starts sticking electrodes into people’s faces, we’d just shut it off. A, this is not necessarily so easy to do if we’ve grown dependent on the system—like, where is the off switch to the Internet? B, why haven’t the chimpanzees flicked the off switch to humanity, or the Neanderthals? They certainly had reasons. We have an off switch, for example, right here. (Choking) The reason is that we are an intelligent adversary; we can anticipate threats and plan around them. But so could a superintelligent agent, and it would be much better at that than we are. The point is, we should not be confident that we have this under control here.
And we could try to make our job a little bit easier by, say, putting the A.I. in a box, like a secure software environment, a virtual reality simulation from which it cannot escape. But how confident can we be that the A.I. couldn’t find a bug. Given that merely human hackers find bugs all the time, I’d say, probably not very confident. So we disconnect the ethernet cable to create an air gap, but again, like merely human hackers routinely transgress air gaps using social engineering. Right now, as I speak, I’m sure there is some employee out there somewhere who has been talked into handing out her account details by somebody claiming to be from the I.T. department.
More creative scenarios are also possible, like if you’re the A.I., you can imagine wiggling electrodes around in your internal circuitry to create radio waves that you can use to communicate. Or maybe you could pretend to malfunction, and then when the programmers open you up to see what went wrong with you, they look at the source code—Bam! -- the manipulation can take place. Or it could output the blueprint to a really nifty technology, and when we implement it, it has some surreptitious side effect that the A.I. had planned. The point here is that we should not be confident in our ability to keep a superintelligent genie locked up in its bottle forever. Sooner or later, it will out.
If I recall correctly, Bostrom has replied to this claim in this manner in several of the talks he has given. While what he says is correct, I think that there is a more important point he should also be making when replying to this claim.
The point is that even if containing an AI in a box so that it could not escape and cause damage was somehow feasible, it would still be incredibly important for us to determine how to create AI that shares our interests and values (friendly AI). And we would still have great reason to be concerned about the creation of unfriendly AI. This is because other people, such as terrorists, could still create an unfriendly AI and intentionally release it into the world to wreak havoc and potentially cause an existential catastrophe.
The idea that we should not be too worried about figuring out how to make AI friendly because we could always contain the AI in a box until we knew it was safe to release is confused not primarily because we couldn’t actually successfully contain it in the box, but rather because the primary reason we have for wanting to quickly figure out how to make a friendly AI is so that we can make a friendly AI before anyone else makes an unfriendly AI.
In his TED Talk, Bostrom continues:
I believe that the answer here is to figure out how to create superintelligent A.I. such that even if—when—it escapes, it is still safe because it is fundamentally on our side because it shares our values. I see no way around this difficult problem.
Bostrom could have strengthened his argument for the position that there is no way around this difficult problem by stating my point above.
That is, he could have pointed out that even if we somehow developed a reliable way to keep a superintelligent genie locked up in its bottle forever, this still would not allow us to avoid having to solve the difficult problem of creating friendly AI with human values, since there would still be a high risk that other people in the world with not-so-good intentions would eventually develop an unfriendly AI and intentionally release it upon the world, or simply not exercise the caution necessary to keep it contained.
Once the technology to make superintelligent AI is developed, good people will be pressured to create friendly AI and let it take control of the future of the world ASAP. The longer they wait, the greater the risk that not-so-good people will develop AI that isn’t specifically designed to have human values. This is why solving the value alignment problem soon is so important.
I’m not sure your argument proves your claim. I think what you’ve shown is that there exist reasons other than the inability to create perfect boxes to care about the value alignment problem.
We can flip your argument around and apply it to your claim: imagine a world where there was only one team with the ability to make superintelligent AI. I would argue that it’ll still be extremely unsafe to build an AI and try to box it. I don’t think that this lets me conclude that a lack of boxing ability is the true reason that the value alignment problem is so important.
I agree that there are several reasons why solving the value alignment problem is important.
Note that when I said that Bostrom should “modify” his reply I didn’t mean that he should make a different point instead of the point he made, but rather meant that he should make another point in addition to the point he already made. As I said:
While what [Bostrom] says is correct, I think that there is a more important point he should also be making when replying to this claim.
This is my first comment on LessWrong.
I just wrote a post replying to part of Bostrom’s talk, but apparently I need 20 Karma points to post it, so… let it be a long comment instead:
Bostrom should modify his standard reply to the common “We’d just shut off / contain the AI” claim
In Superintelligence author Prof. Nick Bostrom’s most recent TED Talk, What happens when our computers get smarter than we are?, he spends over two minutes replying to the common claim that we could just shut off an AI or preemptively contain it in a box in order to prevent it from doing bad things that we don’t like, so there’s no need to be too concerned about the possible future development of AI that has misconceived or poorly specified goals:
If I recall correctly, Bostrom has replied to this claim in this manner in several of the talks he has given. While what he says is correct, I think that there is a more important point he should also be making when replying to this claim.
The point is that even if containing an AI in a box so that it could not escape and cause damage was somehow feasible, it would still be incredibly important for us to determine how to create AI that shares our interests and values (friendly AI). And we would still have great reason to be concerned about the creation of unfriendly AI. This is because other people, such as terrorists, could still create an unfriendly AI and intentionally release it into the world to wreak havoc and potentially cause an existential catastrophe.
The idea that we should not be too worried about figuring out how to make AI friendly because we could always contain the AI in a box until we knew it was safe to release is confused not primarily because we couldn’t actually successfully contain it in the box, but rather because the primary reason we have for wanting to quickly figure out how to make a friendly AI is so that we can make a friendly AI before anyone else makes an unfriendly AI.
In his TED Talk, Bostrom continues:
Bostrom could have strengthened his argument for the position that there is no way around this difficult problem by stating my point above.
That is, he could have pointed out that even if we somehow developed a reliable way to keep a superintelligent genie locked up in its bottle forever, this still would not allow us to avoid having to solve the difficult problem of creating friendly AI with human values, since there would still be a high risk that other people in the world with not-so-good intentions would eventually develop an unfriendly AI and intentionally release it upon the world, or simply not exercise the caution necessary to keep it contained.
Once the technology to make superintelligent AI is developed, good people will be pressured to create friendly AI and let it take control of the future of the world ASAP. The longer they wait, the greater the risk that not-so-good people will develop AI that isn’t specifically designed to have human values. This is why solving the value alignment problem soon is so important.
I’m not sure your argument proves your claim. I think what you’ve shown is that there exist reasons other than the inability to create perfect boxes to care about the value alignment problem.
We can flip your argument around and apply it to your claim: imagine a world where there was only one team with the ability to make superintelligent AI. I would argue that it’ll still be extremely unsafe to build an AI and try to box it. I don’t think that this lets me conclude that a lack of boxing ability is the true reason that the value alignment problem is so important.
I agree that there are several reasons why solving the value alignment problem is important.
Note that when I said that Bostrom should “modify” his reply I didn’t mean that he should make a different point instead of the point he made, but rather meant that he should make another point in addition to the point he already made. As I said:
Ah, I see. Fair enough!