It becomes more complicated when the author of the proof is a superintelligence trying to exploit flaws in the verifier. Probably more importantly, you may not be able to formally verify that the “Friendliness” that the AI provably possesses is actually what you want.
True about the possibility that the AGI trying to trick you. But from what I understand the goal of SI is to come up with a verifiable FAI. You can specify whatever high standard of verifiability you want as the unboxing condition.
“You can specify whatever standard of verifiability you want” is vague. You can say “I want to be absolutely right about whether it’s Friendly”, but you can’t have that unless you know what Friendly means, and are smart enough to specify a standard for checking on it.
If you could be sure you had a cooperative AGI which could just give you an FAI, I think you’d have basically solved the problem of creating an FAI.....but that’s the problem you’re trying to get the AGI to solve for you.
Verifying is hard. Specifying what a FAI is well enough that you’ve even got a chance of having your Unspecified AI developing one is a whole ’nother sort of challenge.
Are there convenient acronyms for differentiating between Uncaring AIs and AIs actively opposed to human interests?
I was assuming that xamdam’s AGI will invent an FAI if people can adequately specify it and it’s possible, or at least it won’t be looking for ways to make things break.
There’s some difference between Murphy’s law and trying to make a deal with the devil. This doesn’t mean I have any certainty that people can find out which one a given AGI has more resemblance to.
I will say that if you tell the AGI “Make me an FAI”, and it doesn’t reply “What do you mean by Friendly?”, it’s either too stupid or too Unfriendly for the job.
Verifying a proof is quite a bit simpler that coming up with the proof in the first place.
It becomes more complicated when the author of the proof is a superintelligence trying to exploit flaws in the verifier. Probably more importantly, you may not be able to formally verify that the “Friendliness” that the AI provably possesses is actually what you want.
True about the possibility that the AGI trying to trick you. But from what I understand the goal of SI is to come up with a verifiable FAI. You can specify whatever high standard of verifiability you want as the unboxing condition.
“You can specify whatever standard of verifiability you want” is vague. You can say “I want to be absolutely right about whether it’s Friendly”, but you can’t have that unless you know what Friendly means, and are smart enough to specify a standard for checking on it.
If you could be sure you had a cooperative AGI which could just give you an FAI, I think you’d have basically solved the problem of creating an FAI.....but that’s the problem you’re trying to get the AGI to solve for you.
That is true, but specifying the theorem to be proven is not always easy.
Verifying is hard. Specifying what a FAI is well enough that you’ve even got a chance of having your Unspecified AI developing one is a whole ’nother sort of challenge.
Are there convenient acronyms for differentiating between Uncaring AIs and AIs actively opposed to human interests?
I was assuming that xamdam’s AGI will invent an FAI if people can adequately specify it and it’s possible, or at least it won’t be looking for ways to make things break.
There’s some difference between Murphy’s law and trying to make a deal with the devil. This doesn’t mean I have any certainty that people can find out which one a given AGI has more resemblance to.
I will say that if you tell the AGI “Make me an FAI”, and it doesn’t reply “What do you mean by Friendly?”, it’s either too stupid or too Unfriendly for the job.