Given how complicated goal systems are, I think that’s actually rather likely. Remember what EY has said about Friendly AI being much much harder than regular AI? I’m inclined to agree with him. The issue could easily come down to the programmers being overconfident and the AI not even being inclined to think about it, focusing more on improving its abilities.
So, the seed AI-in-a-box ends up spending its prodigious energies producing two things:
1) a successor
2) a checkable proof that said successor is friendly (proof checking is much easier than proof producing).
Given how complicated goal systems are, I think that’s actually rather likely. Remember what EY has said about Friendly AI being much much harder than regular AI? I’m inclined to agree with him. The issue could easily come down to the programmers being overconfident and the AI not even being inclined to think about it, focusing more on improving its abilities.
So, the seed AI-in-a-box ends up spending its prodigious energies producing two things: 1) a successor 2) a checkable proof that said successor is friendly (proof checking is much easier than proof producing).