Hmm, if information is still supposed to be able to go to the gatekeeper then the talking person is effectively a gatekeeper too. However it’s not a bad idea to make a box within a box. I got stuck on thinking why we need the gatekeeper to receive information, but in the end it’s pretty simple. We need to accept the friendly AI. But then you could say that how can you trust someone to be your ally without gaining information about their values? This might be an oxymoron ie that by definition you can’t. Thus the box should be anything but black and as white as other considerations allow.
There are other reasons than to check if the AI is friendly. AI, like other software, would have to be tested pretty thoroughly. It would be hard to make an AI if we can’t test it without destroying the world.
Not really. If you have an AI where you’re not sure if it is completely broken or just unfriendly, you might want to test it, but without proper boxing you still risk destroying the world in the unlikely case that the AI works.
Hmm, if information is still supposed to be able to go to the gatekeeper then the talking person is effectively a gatekeeper too. However it’s not a bad idea to make a box within a box. I got stuck on thinking why we need the gatekeeper to receive information, but in the end it’s pretty simple. We need to accept the friendly AI. But then you could say that how can you trust someone to be your ally without gaining information about their values? This might be an oxymoron ie that by definition you can’t. Thus the box should be anything but black and as white as other considerations allow.
There are other reasons than to check if the AI is friendly. AI, like other software, would have to be tested pretty thoroughly. It would be hard to make an AI if we can’t test it without destroying the world.
Isn’t that only a subcase of friendliness testing?
Not really. If you have an AI where you’re not sure if it is completely broken or just unfriendly, you might want to test it, but without proper boxing you still risk destroying the world in the unlikely case that the AI works.