Possible, but fruitlessly unlikely. It basically requires the programmers to do everything right except for adding in a mistaken term in goal system, and then them not catching the mistake and the AI being unable to resolve it without outside help, even after reading normal material on FAI.
Given how complicated goal systems are, I think that’s actually rather likely. Remember what EY has said about Friendly AI being much much harder than regular AI? I’m inclined to agree with him. The issue could easily come down to the programmers being overconfident and the AI not even being inclined to think about it, focusing more on improving its abilities.
So, the seed AI-in-a-box ends up spending its prodigious energies producing two things:
1) a successor
2) a checkable proof that said successor is friendly (proof checking is much easier than proof producing).
Possible, but fruitlessly unlikely. It basically requires the programmers to do everything right except for adding in a mistaken term in goal system, and then them not catching the mistake and the AI being unable to resolve it without outside help, even after reading normal material on FAI.
Given how complicated goal systems are, I think that’s actually rather likely. Remember what EY has said about Friendly AI being much much harder than regular AI? I’m inclined to agree with him. The issue could easily come down to the programmers being overconfident and the AI not even being inclined to think about it, focusing more on improving its abilities.
So, the seed AI-in-a-box ends up spending its prodigious energies producing two things: 1) a successor 2) a checkable proof that said successor is friendly (proof checking is much easier than proof producing).