If an AI has superhuman intelligence, it will make all the decisions, since human mind and tech is full of loopholes and exploits. There is just no way we could contain it if it wanted to be free. If it is not significantly smarter than humans, then there is little danger in releasing it. Using an extra AI as a judge of safety can only work if the judge is at least as smart as the prisoner, in which case you need a judge for the judge, ad infinitum. Maybe the judge can be only, say, 90% as smart as the intelligence it needs to decide on, then it might be possible to have a finite number of judges originating from an actual human, depending on how the probability of an error in judgment stacks up against the intelligence ratio at each step. Sort of like iterated amplification, or a blockchain.
I think the point’s that each judges the other. But we trust neither outright: They point out weaknesses in each other’s reasoning, so they both have to reason in a way that can’t be shown false to us, and we hope that gives an advantage to the side of truth.
“And we hope that gives an advantage to the side of truth”—we aren’t even relying on that. We’re handicapping the AI that wants to be released in terms of message length.
Introducing a handicap to compensate for an asymmetry does not preclude us from the need to rely on the underlying process pointing towards truth in the first place.
If an AI has superhuman intelligence, it will make all the decisions, since human mind and tech is full of loopholes and exploits. There is just no way we could contain it if it wanted to be free. If it is not significantly smarter than humans, then there is little danger in releasing it. Using an extra AI as a judge of safety can only work if the judge is at least as smart as the prisoner, in which case you need a judge for the judge, ad infinitum. Maybe the judge can be only, say, 90% as smart as the intelligence it needs to decide on, then it might be possible to have a finite number of judges originating from an actual human, depending on how the probability of an error in judgment stacks up against the intelligence ratio at each step. Sort of like iterated amplification, or a blockchain.
I think the point’s that each judges the other. But we trust neither outright: They point out weaknesses in each other’s reasoning, so they both have to reason in a way that can’t be shown false to us, and we hope that gives an advantage to the side of truth.
“And we hope that gives an advantage to the side of truth”—we aren’t even relying on that. We’re handicapping the AI that wants to be released in terms of message length.
Introducing a handicap to compensate for an asymmetry does not preclude us from the need to rely on the underlying process pointing towards truth in the first place.
That’s a good point, except you aren’t addressing my scheme as explained by Gurkenglas