Here is a little idea I had at today’s FHI workshop on how encryption & zero-knowledge proof-style approaches might help reduce risks from AI arms races, by using homomorphic encryption to allow safe comparison of countries’ respective AI prowess.
In an AI arms race between two large countries (say, China & the USA), lack of information about each other’s capabilities is highly destabilizing. Overestimates and underestimates can both lead to precipitate actions. One way to defuse tensions would be something equivalent to a nuclear arms inspection regime. But while plutonium & uranium cores and rockets can, with care, be inspected & verified to exist & verifiably destroyed without revealing too many secrets, it’s harder to see how any such arrangement could exist for AI. After all, AI software leaves no radionuclides in the environment to be detected, nor does it emit radiation, nor are the raw materials produced only in very bulky, expensive, and noticeable facilities like nuclear power plants. AI software, on the other hand, only emits waste heat. In general, one is able to prove possession of a cutting-edge AI (simply provide a copy of the software and let the other person run it on their computers) but one is not able to prove the absence. And proving possession is itself problematic since if your AI is better, you certainly don’t want to just hand it to the enemy.
One thought that came to mind about this: we could use cryptography to arrange head-to-head comparisons between countries’ best AIs without leaking source code or the size of the delta.
In the millionaires’ protocol, two millionaires wish to compare their wealth numbers w1 and w2 without revealing any data other than if w1>w2 or vice-versa. In our problem, we could consider the numbers w1 and w2 as being the two AI’s losses on a particular suite of tasks. This avoids leaking the exact losses and performance of each. But what about being forced to reveal the AIs themselves?
NN AIs are based on simple primitives which can be handled efficiently, and turn out to be feasible to run under homomorphic encryption: “CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy”.
The data in the test suite (itself encrypted) probably shouldn’t be specified ahead of time due to overfitting concerns, but can be verifiably randomly generated on the fly using hash precommitment & XORing together a random seed from each country and using it in a PRNG.
Since it’s all homomorphically encrypted, both countries can run their own copy of the testing and not need trust the other to report the results truthfully.
So the full protocol could be: each country provides a homomorphic encrypted version of their best AI to the other, which they will run on a test suite of problems and loss functions computed (again, homomorphically/encrypted), and the comparison of the losses is the only result—revealing which country has a better or worse AI without revealing exactly how much, or anything about the AI implementation.
What about incentives?
A country which doesn’t participate learns nothing about the other, and vice-versa.
A country could submit a lousy AI, but then it will probably learn the result ‘worse’ - but what does this mean? If the second country submits its best AI, then this result is uninformative because the first country was already almost certain that a lousy AI is worse than the second country’s best; and if the second country submitted its best, then it is likewise uninformative. If it submits its best AI, however, and it learns the result ‘worse’, then it has learned something very important: that the second country has at least one AI, and possibly many AIs, which are better than its best, and the first country should avoid war. If it learns ‘better’, then it could be ahead (if the second country submitted its best) but also might not be (if the second country submitted its worst). Thus, the first country learns something useful only if it submits its best AI. And the same reasoning symmetrically holds for the second country as well. So one might expect both to participate honestly, submit their best AIs, learn who is ahead, more accurately gauge their relative strengths, and reduce the uncertainty & risk of war rather than a peaceful accommodation.
Of course, a country reasoning like this could try to submit a deliberately bad AI while expecting the other to submit their best AI, to trick the other country into overconfidence, but it does so at the risk of that overconfidence starting a war and also denying itself the chance to check how well the other country has done—it assumes that it already knows how well the other country is doing and doesn’t need the information, but if it has so accurately gauged the other country, the other country probably knows that the first country knows that the second country is doing much worse, and so the first country has no need to participate in the AI arms measure and any ‘victory’ is just trickery (and maybe it shouldn’t bother participating at all).
So overall, it looks like countries are better off participating honestly than not participating, and not participating than participating dishonestly, and the effects are pro-peace.
One, do you think a “a homomorphic encrypted version of their best AI” is a viable thing? As far as I know homomorphic-encrypted software is very very very slow. By the time a homomorphic-encrypted version completes its AI-level tests, it might well be obsolete.
Second, nuclear inspection regimes and such have the goal of veryfing the cap on capabilities. Usually you are not allowed to have more than X missiles or Y kg of enriched uranium. But that’s not the information which Yao’s problem provides. Imagine that during the Cold War all the US and the USSR could know was whether one side’s nuclear arsenal was better than the other side’s. That doesn’t sound stabilizing at all to me.
Here is a little idea I had at today’s FHI workshop on how encryption & zero-knowledge proof-style approaches might help reduce risks from AI arms races, by using homomorphic encryption to allow safe comparison of countries’ respective AI prowess.
In an AI arms race between two large countries (say, China & the USA), lack of information about each other’s capabilities is highly destabilizing. Overestimates and underestimates can both lead to precipitate actions. One way to defuse tensions would be something equivalent to a nuclear arms inspection regime. But while plutonium & uranium cores and rockets can, with care, be inspected & verified to exist & verifiably destroyed without revealing too many secrets, it’s harder to see how any such arrangement could exist for AI. After all, AI software leaves no radionuclides in the environment to be detected, nor does it emit radiation, nor are the raw materials produced only in very bulky, expensive, and noticeable facilities like nuclear power plants. AI software, on the other hand, only emits waste heat. In general, one is able to prove possession of a cutting-edge AI (simply provide a copy of the software and let the other person run it on their computers) but one is not able to prove the absence. And proving possession is itself problematic since if your AI is better, you certainly don’t want to just hand it to the enemy.
One thought that came to mind about this: we could use cryptography to arrange head-to-head comparisons between countries’ best AIs without leaking source code or the size of the delta.
In the millionaires’ protocol, two millionaires wish to compare their wealth numbers w1 and w2 without revealing any data other than if w1>w2 or vice-versa. In our problem, we could consider the numbers w1 and w2 as being the two AI’s losses on a particular suite of tasks. This avoids leaking the exact losses and performance of each. But what about being forced to reveal the AIs themselves?
NN AIs are based on simple primitives which can be handled efficiently, and turn out to be feasible to run under homomorphic encryption: “CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy”. The data in the test suite (itself encrypted) probably shouldn’t be specified ahead of time due to overfitting concerns, but can be verifiably randomly generated on the fly using hash precommitment & XORing together a random seed from each country and using it in a PRNG. Since it’s all homomorphically encrypted, both countries can run their own copy of the testing and not need trust the other to report the results truthfully.
So the full protocol could be: each country provides a homomorphic encrypted version of their best AI to the other, which they will run on a test suite of problems and loss functions computed (again, homomorphically/encrypted), and the comparison of the losses is the only result—revealing which country has a better or worse AI without revealing exactly how much, or anything about the AI implementation.
What about incentives?
A country which doesn’t participate learns nothing about the other, and vice-versa.
A country could submit a lousy AI, but then it will probably learn the result ‘worse’ - but what does this mean? If the second country submits its best AI, then this result is uninformative because the first country was already almost certain that a lousy AI is worse than the second country’s best; and if the second country submitted its best, then it is likewise uninformative. If it submits its best AI, however, and it learns the result ‘worse’, then it has learned something very important: that the second country has at least one AI, and possibly many AIs, which are better than its best, and the first country should avoid war. If it learns ‘better’, then it could be ahead (if the second country submitted its best) but also might not be (if the second country submitted its worst). Thus, the first country learns something useful only if it submits its best AI. And the same reasoning symmetrically holds for the second country as well. So one might expect both to participate honestly, submit their best AIs, learn who is ahead, more accurately gauge their relative strengths, and reduce the uncertainty & risk of war rather than a peaceful accommodation.
Of course, a country reasoning like this could try to submit a deliberately bad AI while expecting the other to submit their best AI, to trick the other country into overconfidence, but it does so at the risk of that overconfidence starting a war and also denying itself the chance to check how well the other country has done—it assumes that it already knows how well the other country is doing and doesn’t need the information, but if it has so accurately gauged the other country, the other country probably knows that the first country knows that the second country is doing much worse, and so the first country has no need to participate in the AI arms measure and any ‘victory’ is just trickery (and maybe it shouldn’t bother participating at all).
So overall, it looks like countries are better off participating honestly than not participating, and not participating than participating dishonestly, and the effects are pro-peace.
Interesting, but unclear that the sort of benchmarking useful for this is also the sort of thing that determines AI military usefulness.
I think you’ve got a malformed link here.
Yeah, looks like my accidental Wikipedia interwiki broke the next link.
Two comments.
One, do you think a “a homomorphic encrypted version of their best AI” is a viable thing? As far as I know homomorphic-encrypted software is very very very slow. By the time a homomorphic-encrypted version completes its AI-level tests, it might well be obsolete.
Second, nuclear inspection regimes and such have the goal of veryfing the cap on capabilities. Usually you are not allowed to have more than X missiles or Y kg of enriched uranium. But that’s not the information which Yao’s problem provides. Imagine that during the Cold War all the US and the USSR could know was whether one side’s nuclear arsenal was better than the other side’s. That doesn’t sound stabilizing at all to me.
Yes. See the reference. Even a 10 or 100x computation cost increase would be acceptable for top-level national security purposes like this.
That sounds very stabilizing to me. ‘We must prevent a missile gap!’
Which reference? I’m not talking about the millionaires’ problem, I’m talking about executing homomorphic code.
One side thinks this and so accelerates the arms race. The other side thinks “This is our chance! We must strike while we know we’re ahead!” :-/