Formal confinment prototype

Link post

This whitepaper was a preliminary approach to using proof certificates in a secure program synthesis protocol last summer.

Abstract

We would like to put the AI in a box. We show how to create an interface between the box and the world out of specifications in Lean. It is the AI’s responsibility to provide a proof that its (restricted) output abides by the spec. The runnable prototype is at https://​​github.com/​​for-all-dev/​​formal-confinement.

Related Work (excerpt)

In the seminal “A note on the confinement problem”, Lampson 1973 states confinement rules to reason about when a program is or isn’t confined.

  1. Total isolation or transitivity: either it does not call any other program or if it calls another program that program is also confined.

  2. Masking and enforcement: all inputs (including side-channels) must be fully specified and enforced by the caller, and input to covert channels conforms to caller’s specifications.

Building on Lampson, Yampolskiy 2012 considers confinement of AI, especially superintelligence.

In the hopes of starting a new subfield of computer security, AI Safety Engineering, we define the Artificial Intelligence Confinement Problem (AICP) as the challenge of restricting an artificially intelligent entity to a confined environment from which it can’t exchange information with the outside environment via legitimate or covert channels if such information exchange was not authorized by the confinement authority

The current formal confinement work explores a restriction of Yampolskiy’s AICP to the imp programming language, where the confinement authority is the user providing precondition and postcondition backed by Lean. In fact, the setting is so restricted that we meet the Lampson rules almost for free.

Verification burden

In our toy setting, we report verification burdens (ratios of how much more expensive it is in cognition to ship a proof once you’ve already shipped a program) between 1.5 and 4.2.