Once AI exists, in the public, it isn’t containable. Even if we can box it, someone will build it without a box. Or like you said, ask it how to make as many paperclips as possible.
But if we get to AI first, and we figure out how to box it and get it to do useful work, then we can use it to help solve FAI. Maybe. You could ask it questions like “how do I build a stable self improving agent” or “what’s the best way to solve the value loading problem”, etc.
You would need some assurance that the AI would not try to manipulate the output. That’s the hard part, but it might be doable. And it may be restricted to only certain kinds of questions, but that’s still very useful.
Once AI exists, in the public, it isn’t containable.
You mean like the knowledge of how it was made is public and anyone can do it? Definitely not. But if you keep it all proprietary it might be possible to contain.
But if we get to AI first, and we figure out how to box it and get it to do useful work, then we can use it to help solve FAI. Maybe.
I suppose what we should do is figure out how to make friendly AI, figure out how to create boxed AI, and then build an AI that’s probably friendly and probably boxed, and it’s more likely that everything won’t go horribly wrong.
You would need some assurance that the AI would not try to manipulate the output.
Manipulate it to do what? The idea behind mine is that the AI only cares about answering the questions you pose it given that it has no inputs and everything operates to spec. I suppose it might try to do things to guarantee that it operates to spec, but it’s supposed to be assuming that.
Once AI exists, in the public, it isn’t containable. Even if we can box it, someone will build it without a box. Or like you said, ask it how to make as many paperclips as possible.
But if we get to AI first, and we figure out how to box it and get it to do useful work, then we can use it to help solve FAI. Maybe. You could ask it questions like “how do I build a stable self improving agent” or “what’s the best way to solve the value loading problem”, etc.
You would need some assurance that the AI would not try to manipulate the output. That’s the hard part, but it might be doable. And it may be restricted to only certain kinds of questions, but that’s still very useful.
You mean like the knowledge of how it was made is public and anyone can do it? Definitely not. But if you keep it all proprietary it might be possible to contain.
I suppose what we should do is figure out how to make friendly AI, figure out how to create boxed AI, and then build an AI that’s probably friendly and probably boxed, and it’s more likely that everything won’t go horribly wrong.
Manipulate it to do what? The idea behind mine is that the AI only cares about answering the questions you pose it given that it has no inputs and everything operates to spec. I suppose it might try to do things to guarantee that it operates to spec, but it’s supposed to be assuming that.