Richard_Kennaway comments on Provably Safe AI

Richard_Kennaway 5 Oct 2023 23:37 UTC
8 points
4
From the paper:

AI is allowed to write the code and proof, but not the proof-checker.

Some examples from the paper of things to be proved are “this plane will not fly into buildings” (p.3), “this drone flies safely” (p.2), and “this social media account is human” (p.2). Somehow a formal specification is made of such things, and the proof-checker verifies that the proof indeed proves that the code satisfies the formal specification.

The only suggestion that the paper makes of how to make a formal specification is by citing its reference 94, “Autoformalization with Large Language Models”. I have not read it, but the title implies that its answer is to have an AI do it. Where do you get an AI that is safe enough to be trusted with that?

Putting the two papers together, the scheme is:
1. Give one AI a specification like “no-one can fly this plane into a building”.
2. Have the AI come up with a formal specification of what that means.
3. A second AI generates avionics software to satisfy that specification and a purported proof that it does so.
4. Run the specification, the code, and the proof through a dumb proof-checker.
5. The code has been proved safe! Run it.
The second AI does not have to be safe, because the code it generates will not be run unless it passes step 4. But what about the first AI? The entire safety issue is still there. How can we assure ourselves that the formal specification that it generates actually describes what we really wanted? Especially if “no-one can fly this plane into a building” turns into a vast pile of mathematics as impenetrable as any neural network’s weight matrices.

How do you stop someone doing the same with a different spec, “this plane will fly into any building I tell it to”?