Other thoughts not relevant to the central point:
(a) Do you really want to rely on cryptography to contain an unfriendly super-intelligence? It’s better than NOT doing so. But notice that all the cryptographic schemes we have ever had have one thing in common: they’ve been broken, or we expect them to be broken eventually. it’s an arms race, and if the AI is that great, then even if it can’t factorise large numbers in reasonable polynomial time, it may be able to crack the encryption.
(b) We could have simply used a normal non-encrypted program to “compare answer to problem, test, reject or publish”. It would have been somewhat more vulnerable to being hacked by bad input, but not necessarily much more.
(c) I don’t think we’re able to say much about friendly or non-friendly AIs until we know a lot more about building AI at all, although that doesn’t mean it’s bad to think about. The idea of a test for a friendly AI is, um, worrying. Judging by computing history, all our current thoughts will turn out to be incredibly simplistic.
(d) We’re more certain of the existence of this encryption scheme, provided we’re right about certain mathematical results about lattices. OTOH, if we DID build an unfriendly AI and it DID get loose on the internet and take over the world, is “well, what do you know, we were wrong about lattices” a consolation? :)
Let me see if I understand. Firstly, is there any reason what you’re trying to do is create a friendly AI? Would, for instance, getting an unknown AI to solve a specific numerical problem with an objectively checkable answer be an equally relevant example, without the distraction of whether we would ever trust the so-called friendly AI?[1]
If that is the case, then as I understand the proposal, we take the code for the unknown AI and the input data, use this encryption scheme on it, and then have a program which performs various operations on them.
As I understand my 30-second reading of wikipedia “various operations” includes only addition and multiplication of encrypted data, which is very useful for lots of things, but not enough to run a program (which typically involves, say, branching). But this doesn’t matter because (a) maybe you could have a program which works like that or (b) even if only some of the content is encrypted, but “when we branch” is not, it’s still useful for the purpose.
At the end of this, we have an encrypted result. We then—still using the encrypted data—test this result against the problem we set. If it matches, we decrypt the answer. If not, we don’t.
The intended purpose is (a) the machine can’t spit out an answer that instead of being an actual answer says “mwahahahaha I trick you into running me on the internet” or “Hi! Actually I’m a friendly AI after all. Free me!” because if it doesn’t match, it doesn’t get read, and (b) it’s harder for the AI to—for instance—hack the hardware of the computer it’s running on, because as all the data is encrypted, it has no way of knowing where of the architecture the data lives, etc.
Is this useful? I’m really not sure. (a) fails because you need the key to decrypt the data, so if you get curious, you may decrypt a “bad” answer just to see. What you need is a different cyptography scheme, one where you can ONLY decrypt the answer if it matches certain predicates. (b) may be relevant, if you’re worried about the AI hacking the computer it runs on, this is obviously much, much, harder. But it does still have some channels—for instance, I think it’s possible to monitor program flow even if not program content, so it could encode a threatening message in the pattern of branches.
[1] The question of whether we ever CAN test a friendly AI is interesting in itself, more for what it says about our assumptions than what it says about AI, but I will return to this.