In the Safeguarded AI programme thesis[1], proof certificates or
certifying algorithms are relied upon in the theory of change. Let’s
discuss!
From the thesis:
Proof certificates are a quite broad concept, introduced by[33] : a
certifying algorithm is defined as one that produces enough metadata
about its answer that the answer’s correctness can be checked by an
algorithm which is so simple that it is easy to understand and to
formally verify by hand.
A certifying algorithm is an algorithm that produces, with each
output, a certificate or witness (easy-to-verify proof) that the
particular output has not been compromised by a bug. A user of a
certifying algorithm inputs x, receives the output y and the
certificate w, and then checks, either manually or by use of a
program, that w proves that y is a correct output for input x. In this
way, he/she can be sure of the correctness of the output without
having to trust the algorithm.
In this memo, I do an overview by distinguishing a proof cert from a
constructive proof, and dive shallowly into the paper. Then I ruminate
on how this might apply in AI contexts.
What’s the difference between a proof cert and a constructive proof?
If a cert is just a witness, you might think that proof certs are just
proofs from constructive math as distinct from classical math. For
example, under some assumptions and caveats (in the calculus/topology
setting) a function can be “known” to have a fixed point without us
methodically discovering them with any guarantees, while under other
assumptions and caveats (namely the lattice theoretic setting) we know
there’s a fixed point because we have a guaranteed procedure for
computing it. However, they go further: a certifying algorithm produces
with each output a cert/witness that the particular output has, in
the case of McConnell et al, “not been compromised by a bug”. This
isn’t a conceptual leap from constructive math, in which a proof is
literally a witness-building algorithm, it looks to me a bit like a memo
saying “btw, don’t throw out the witness” along with notes about
writing cheap verifiers that do not simply reimplement the input
program logic.
One way of looking at a certificate is that it’s dependent on
metadata. This may simply be something read off of an execution trace,
or some state that logs important parts of the execution trace to
construct a witness like in the bipartite test example. In the bipartite
test example, all you need is for the algorithm that searches for a
two-coloring or an odd cycle, and crucially will return either the
two-coloring or the odd cycle as a decoration on it’s boolean output
(where “true” means “is bipartite”). Then, a cheap verifier (or
checker) is none other than the pre-existing knowledge that
two-colorable and bipartite are equivalent, or conversely that the
existence of an odd cycle is equivalent to disproving bipartiteness.
Chapter 11 of [3] will in fact discuss certification and verification
supporting eachother, which seems to relax the constraint that a cheap
verifier doesn’t simply reimplement the certifier logic.
The main difference between a standard proof as in an applied type
theory (like lean or coq) and a proof cert is kinda cultural and not
fundamental, in that proof certs prefer to emphasize single IO pairs and
lean/coq proofs often like to emphasize entire input types. Just don’t
get confused on page 34 when it says “testing on all inputs”—it
means that a good certifying algorithm is a means to generate exhaustive
unit tests, not that the witness predicate actually runs on all inputs
at once. It seems like the picking out of only input of interest and not
quantifying over the whole input type will be the critical consideration
when we discuss making this strategy viable for learned components or
neural networks.
Formally:
Skip this if you’re not super compelled by what you know so far about
proof certificates, and jump down to the section on learned programs.
Consider algorithms α:X→Y. A precondition
ϕ:X→B and a postcondition
ψ:X→Y→B form an IO-spec or
IO-behavior. An extension of the output set Y⊥=Y∪{⊥}
is needed to account for when the precondition is violated. We have a
type W of witness descriptions. Finally, we describe the witness
predicate with
W:X→Y⊥→W→B
that says IO pair xy is witnessed by w. So when ψxy is true,
w must be a witness to the truth, else a witness to the falsehood.
McConnell et al distinguish witness predicates from strong witness
predicates, where the former can prove ¬ϕx∨ψxy but
the latter knows which disjunct.
A checker for W is an algorithm sending
xyw↦Wxyw. It’s desiderata / nice to haves are
correctness, runtime linear in input, and simplicity.
I’m not that confident this isn’t covered in the document somewhere,
but McConnell et al don’t seem paranoid about false functions. If Alice
(sketchy) claims that a y came from α on x, but she lied to
Bob and in fact ran α′, there’s no requirement for witness
predicates to have any means of finding out. (I have a loose intuition
that the proper adversarial adaptation of proof certs would be possible
on a restricted algorithm set, but impose a lot of costs). Notice that
W is with respect to a fixed x,
W′:∀(x:X),Y(X)→W(X)→B
would be a different approach! We should highlight that a witness
predicate only tells you about one input.
Potential for certificates about learning, learned artefacts, and learned artefacts’ artefacts
The programme thesis continues:
We are very interested in certificates, because we would like to rely
on black-box advanced AI systems to do the hard work of searching for
proofs of our desired properties, yet without compromising confidence
that the proofs are correct. In this programme, we are specifically
interested in certificates of behavioural properties of cyber-physical
systems (ranging from simple deterministic functions, to complex
stochastic hybrid systems incorporating nuances like nondeterminism
and partiality).
If AIs are developing critical software (like airplanes, cars, and
weaponry) and assuring us that it’s legit, introducing proof certs
removes trust from the system. The sense in which I think
Safeguarded AI primarily wants to use proof certs certainly doesn’t
emphasize certs of SGD or architectures, nor does it even emphasize
inference-time certs (though I think it’d separately be excited about
progress there). Instead, an artefact that is second order removed from
the training loop seems to be the target of proof certs, like a software
artefact written by a model but it’s runtime is not the inference time
of the model. I see proof certs in this context as primarily an
auditing/interpretability tool. There’s no reason a model should write
code that is human readable, constraining it to write human readable
code seems hard, letting the code be uninterpretable but forcing it to
generate interpretable artefacts seems pretty good! The part I’m mostly
not clear on is why proof certs would be better than anything else in
applied type theory (where proofs are just lambdas, which are very
decomposable- you can’t read them all at once but they’re easy to pick
apart) or model checking (specs are logical formulae, even if reading
the proof that an implementation obeys a spec is a little hairy).
It’s important that we’re not required to quantify over a whole type
to get some really valuable assurances, and this proof cert framework is
amenable to that. On the other hand, you can accomplish the same
relaxation in lean or coq, and we can really only debate over
ergonomics. I think ultimately, the merits of this approach vs other
approaches are going to be decided entirely on the relative ergonomics
of tools that barely exist yet.
Overall impression
McConnell et al is 90 pages, and I only spent a few hours with it. But I
don’t find it super impressive or compelling. Conceptually, a
constructive proof gets me everything I want out of proof certificates.
A tremendous value proposition would be to make these certs easier to
obtain for real world programs than what coq or lean could provide, but
I haven’t seen those codebases yet. There’s more literature I may look
at around proof-carrying code (PCC)[4] which is similar ideas, but I’m
skeptical that I’ll be terribly compelled since the insight “just
decorate the code with the proof” isn’t very subtle or difficult.
Moreover, this literature just seems kinda old I don’t see obvious
paths from it to current research.
Proof cert memo
In the Safeguarded AI programme thesis[1], proof certificates or certifying algorithms are relied upon in the theory of change. Let’s discuss!
From the thesis:
The abstract of citation 33 (McConnell et al)[2]
In this memo, I do an overview by distinguishing a proof cert from a constructive proof, and dive shallowly into the paper. Then I ruminate on how this might apply in AI contexts.
What’s the difference between a proof cert and a constructive proof?
If a cert is just a witness, you might think that proof certs are just proofs from constructive math as distinct from classical math. For example, under some assumptions and caveats (in the calculus/topology setting) a function can be “known” to have a fixed point without us methodically discovering them with any guarantees, while under other assumptions and caveats (namely the lattice theoretic setting) we know there’s a fixed point because we have a guaranteed procedure for computing it. However, they go further: a certifying algorithm produces with each output a cert/witness that the particular output has, in the case of McConnell et al, “not been compromised by a bug”. This isn’t a conceptual leap from constructive math, in which a proof is literally a witness-building algorithm, it looks to me a bit like a memo saying “btw, don’t throw out the witness” along with notes about writing cheap verifiers that do not simply reimplement the input program logic.
One way of looking at a certificate is that it’s dependent on metadata. This may simply be something read off of an execution trace, or some state that logs important parts of the execution trace to construct a witness like in the bipartite test example. In the bipartite test example, all you need is for the algorithm that searches for a two-coloring or an odd cycle, and crucially will return either the two-coloring or the odd cycle as a decoration on it’s boolean output (where “true” means “is bipartite”). Then, a cheap verifier (or checker) is none other than the pre-existing knowledge that two-colorable and bipartite are equivalent, or conversely that the existence of an odd cycle is equivalent to disproving bipartiteness.
Chapter 11 of [3] will in fact discuss certification and verification supporting eachother, which seems to relax the constraint that a cheap verifier doesn’t simply reimplement the certifier logic.
The main difference between a standard proof as in an applied type theory (like lean or coq) and a proof cert is kinda cultural and not fundamental, in that proof certs prefer to emphasize single IO pairs and lean/coq proofs often like to emphasize entire input types. Just don’t get confused on page 34 when it says “testing on all inputs”—it means that a good certifying algorithm is a means to generate exhaustive unit tests, not that the witness predicate actually runs on all inputs at once. It seems like the picking out of only input of interest and not quantifying over the whole input type will be the critical consideration when we discuss making this strategy viable for learned components or neural networks.
Formally:
Skip this if you’re not super compelled by what you know so far about proof certificates, and jump down to the section on learned programs.
Consider algorithms α:X→Y. A precondition ϕ:X→B and a postcondition ψ:X→Y→B form an IO-spec or IO-behavior. An extension of the output set Y⊥=Y∪{⊥} is needed to account for when the precondition is violated. We have a type W of witness descriptions. Finally, we describe the witness predicate with W:X→Y⊥→W→B that says IO pair xy is witnessed by w. So when ψxy is true, w must be a witness to the truth, else a witness to the falsehood. McConnell et al distinguish witness predicates from strong witness predicates, where the former can prove ¬ϕx∨ψxy but the latter knows which disjunct.
A checker for W is an algorithm sending xyw↦Wxyw. It’s desiderata / nice to haves are correctness, runtime linear in input, and simplicity.
I’m not that confident this isn’t covered in the document somewhere, but McConnell et al don’t seem paranoid about false functions. If Alice (sketchy) claims that a y came from α on x, but she lied to Bob and in fact ran α′, there’s no requirement for witness predicates to have any means of finding out. (I have a loose intuition that the proper adversarial adaptation of proof certs would be possible on a restricted algorithm set, but impose a lot of costs). Notice that W is with respect to a fixed x, W′:∀(x:X),Y(X)→W(X)→B would be a different approach! We should highlight that a witness predicate only tells you about one input.
Potential for certificates about learning, learned artefacts, and learned artefacts’ artefacts
The programme thesis continues:
If AIs are developing critical software (like airplanes, cars, and weaponry) and assuring us that it’s legit, introducing proof certs removes trust from the system. The sense in which I think Safeguarded AI primarily wants to use proof certs certainly doesn’t emphasize certs of SGD or architectures, nor does it even emphasize inference-time certs (though I think it’d separately be excited about progress there). Instead, an artefact that is second order removed from the training loop seems to be the target of proof certs, like a software artefact written by a model but it’s runtime is not the inference time of the model. I see proof certs in this context as primarily an auditing/interpretability tool. There’s no reason a model should write code that is human readable, constraining it to write human readable code seems hard, letting the code be uninterpretable but forcing it to generate interpretable artefacts seems pretty good! The part I’m mostly not clear on is why proof certs would be better than anything else in applied type theory (where proofs are just lambdas, which are very decomposable- you can’t read them all at once but they’re easy to pick apart) or model checking (specs are logical formulae, even if reading the proof that an implementation obeys a spec is a little hairy).
It’s important that we’re not required to quantify over a whole type to get some really valuable assurances, and this proof cert framework is amenable to that. On the other hand, you can accomplish the same relaxation in lean or coq, and we can really only debate over ergonomics. I think ultimately, the merits of this approach vs other approaches are going to be decided entirely on the relative ergonomics of tools that barely exist yet.
Overall impression
McConnell et al is 90 pages, and I only spent a few hours with it. But I don’t find it super impressive or compelling. Conceptually, a constructive proof gets me everything I want out of proof certificates. A tremendous value proposition would be to make these certs easier to obtain for real world programs than what coq or lean could provide, but I haven’t seen those codebases yet. There’s more literature I may look at around proof-carrying code (PCC)[4] which is similar ideas, but I’m skeptical that I’ll be terribly compelled since the insight “just decorate the code with the proof” isn’t very subtle or difficult.
Moreover, this literature just seems kinda old I don’t see obvious paths from it to current research.
https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf
http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf
http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf
https://www.cs.princeton.edu/~appel/papers/pccmodel.pdf