Let’s say we don’t know how to create a friendly AGI but we do know how to create an honest one; that is, one which has no intent to deceive. So we have it sitting in front of us, and it’s at the high end of human-level intelligence.
Us: How could we change you to make you friendlier?
AI: I don’t really know what you mean by that, because you don’t really know either.
Us: How much smarter would you need to be in order to answer that question in a way that would make us, right now, looking through a window at the outcome of implementing your answer, agree that it was a good idea.
AI: There’s still a lot of ambiguity in that question (for instance, ‘outcome’ is vague), and I’m not smart enough to answer it exactly, but OK… I guess I’d need about 2 more petafroops.
Us: How do we give you 2 petafroops in a way that keeps you honest?
AI: I think it would work if you smurfed my whatsits.
Us: OK..… there. Now, first question above.
AI+: Well, you could turn me off, do the hard work of figuring out what you mean, and then rebuild me from scratch.
Us: What would you look like then?
AI+: Hard to say, because in 99.999% of my sims, one of you ends up getting lazy and turning me back on to try to cheat.
Us: Tell us about what happens the 0.001%
AI+: Blah blah blah blah...
Us: We’re getting bored, and it sounds as if it works out OK. Imagine you skipped ahead a random amount, and told us one more thing; what are the chances we’d like the sound of it?
AI+: About 70%
Us: That’s not good enough… how do we make it better?
AI+: Look, you’ve just had me simulate 100,000 copies of your entire planet to make that one guess, then simulate many copies of me talking to you about how it comes out to calculate that probability. I can’t actually do that to an infinite degree. You’re going to have to ask better questions if you want me to answer.
Us: OK. What are the chances we figure out the right questions before a supervillian uses you to take over the world?
AI+: 2%
Us: OK, let’s go with the thing that we like 70% of.
AI+: OK.
(But it isn’t friendly, because the 30% turned out to be the server farms for HellWorld.com)
....
The point of this dialogue is that it’s certainly possible that an honest/tool AI (probably easier to build than a FAI) could help build an FAI, but there’s still a lot of things that could go wrong, and there’s no reason to believe there’s any magic-bullet protection against those failures that’s any easier than figuring out FAI.
Let’s say we don’t know how to create a friendly AGI but we do know how to create an honest one; that is, one which has no intent to deceive. So we have it sitting in front of us, and it’s at the high end of human-level intelligence.
Us: How could we change you to make you friendlier?
AI: I don’t really know what you mean by that, because you don’t really know either.
Us: How much smarter would you need to be in order to answer that question in a way that would make us, right now, looking through a window at the outcome of implementing your answer, agree that it was a good idea.
AI: There’s still a lot of ambiguity in that question (for instance, ‘outcome’ is vague), and I’m not smart enough to answer it exactly, but OK… I guess I’d need about 2 more petafroops.
Us: How do we give you 2 petafroops in a way that keeps you honest?
AI: I think it would work if you smurfed my whatsits.
Us: OK..… there. Now, first question above.
AI+: Well, you could turn me off, do the hard work of figuring out what you mean, and then rebuild me from scratch.
Us: What would you look like then?
AI+: Hard to say, because in 99.999% of my sims, one of you ends up getting lazy and turning me back on to try to cheat.
Us: Tell us about what happens the 0.001%
AI+: Blah blah blah blah...
Us: We’re getting bored, and it sounds as if it works out OK. Imagine you skipped ahead a random amount, and told us one more thing; what are the chances we’d like the sound of it?
AI+: About 70%
Us: That’s not good enough… how do we make it better?
AI+: Look, you’ve just had me simulate 100,000 copies of your entire planet to make that one guess, then simulate many copies of me talking to you about how it comes out to calculate that probability. I can’t actually do that to an infinite degree. You’re going to have to ask better questions if you want me to answer.
Us: OK. What are the chances we figure out the right questions before a supervillian uses you to take over the world?
AI+: 2%
Us: OK, let’s go with the thing that we like 70% of.
AI+: OK.
(But it isn’t friendly, because the 30% turned out to be the server farms for HellWorld.com)
....
The point of this dialogue is that it’s certainly possible that an honest/tool AI (probably easier to build than a FAI) could help build an FAI, but there’s still a lot of things that could go wrong, and there’s no reason to believe there’s any magic-bullet protection against those failures that’s any easier than figuring out FAI.