It seems clear that many high-level programming languages are candidates for use in the first AGI. They have enough power to write that code. It seems clear, however, that the power that those languages have is incompatible with formal safety. SPARK or OCaml are made so that it is easy to prove correctness, which seems useful, but that’s not enough.

For example, we might need memory safety to provide a formal guarantee that the program cannot directly modify the part of memory containing the reward function, or the calculation of the reward. On the other hand, it seems that Turing incompleteness, which allows guaranteeing that a program terminates, would not be necessary.

So—what other (extant or yet-to-be defined) types of language safety will be needed from a language to prevent a hypothetically provably safe AI from being unsafe in practice?

If you are going to include formal proofs with your AI showing that the code does what it’s supposed to, in the style of Coq and friends, then the characteristics of traditionally unsafe languages are not a deal-breaker. You can totally write provably correct and safe code in C, and you don’t need to restrict yourself to a sharply limited version of the language either. You just need to prove that you are doing something sensible each time you perform a potentially unsafe action, such as accessing memory through a pointer.

This slows things down and adds to your burdens of proof, but not really by that much. It’s a few more invariants you need to carry around with you throughout your proofs. Where in a safer language you may have to prove that your Foo list is still sorted after a certain operation, in C you will additionally have to prove that your pointer topology is still what you want after a certain operation. No big deal. In particular, a mature toolkit for proving properties about C programs will presumably have tools for automating away the 99% of trivial proof obligations involving pointer topology, leaving something for you to prove only when you are doing something clever.

For any such property that you can screw up, a language that will not allow you to screw it up in the first place will make your life easier and your proofs simpler, which is why OCaml is more popular as a vehicle for proven correct code than C. But if you are proving the correctness of every aspect of your program anyway, this is a pretty minor gain; the amount of stuff there is to prove about your object-level algorithms will be vastly greater than the requirements added by the lack of safety of your programming language. If only because there will be specialized tools for rapidly dealing with the latter, but not the former.

Not all those specialized tools exist just yet. Currently, the program correctness proof systems are pretty good at proving properties about functions [in the mathematical sense] operating on data structured in a way that mathematicians like to work with, and a bit rubbish at everything else people use software to do; it is no coincidence that Coq, Agda, Idris, Isabelle, and friends all work primarily with purely functional languages using some form of constructed types as a data model. But techniques for dealing with computing applications in a broader sense will have to be on the proof-technology roadmap sooner or later, if correctness proofs are ever going to fulfill their promise outside selected examples. And when they do, there will not be a big difference between programming in C and programming in OCaml as far as proving correctness is concerned.

tl;dr: I don’t think language safety is going to be more than a rounding error if you want to prove the correctness of a large piece of software down to the last bit, once all the techniques for doing that sort of thing at all are in place. The amount of program-specific logic in need of manual analysis is vastly greater than the amount of boilerplate a safe language can avoid.

I think the programming language could be key to a self-improving AI being able to prove that the new implementation achieves the same goals as the old one, as well as us humans being able to prove that the AI is going to do what we expect.

To me it seems like memory safety is price of entry but I expect the eventual language will end up needing to be quite friendly to static analysis and theorem proving. That probably means very restricted side effects and mutation, as well as statically checkable memory and compute limits. Possibly also taking hardware unreliability into account, although I have no idea how to do that.

The language should be easy to write code in — if it’s too hard to write the code, you’re going to be out-competed by unfriendly AI projects — but also easy to write and prove fancy types/static assertions/contracts, because humans are going to be needing to prove a lot of stuff about code in this language and it seems like the proofs should also be in the code. My current vision would be some combination of Coq, Rust and Liquid Haskell.

In the context of programming languages, “proof” means “machine-checkable proof from mathematical axioms”. While there is difficult work involved in bridging between philosophy and mathematics, a programming language is only going to help on the math side, and on the math side, verifying proofs (once written in programming-language form) is trivial.

I believe Ed Kmett is working on a language while at MIRI. Probably not specifically AI safety focused, he was working on it before he joined. But maybe worth looking into. I’m not sure if there’s much public info about it.

Yes, an unsafe AI cannot be boxed on unsafe hardware, not can any AI running on physical hardware be made immune to attacks—but those are very different questions. For this question, first we assume that a provably safe AI can be written, then I wanted to ask what language would be needed.

## [Question] What Programming Language Characteristics Would Allow Provably Safe AI?

It seems clear that many high-level programming languages are candidates for use in the first AGI. They have enough power to write that code. It seems clear, however, that the power that those languages have is incompatible with formal safety. SPARK or OCaml are made so that it is easy to prove correctness, which seems useful, but that’s not enough.

For example, we might need memory safety to provide a formal guarantee that the program cannot directly modify the part of memory containing the reward function, or the calculation of the reward. On the other hand, it seems that Turing incompleteness, which allows guaranteeing that a program terminates, would not be necessary.

So—what other (extant or yet-to-be defined) types of language safety will be needed from a language to prevent a hypothetically provably safe AI from being unsafe in practice?

If you are going to include formal proofs with your AI showing that the code does what it’s supposed to, in the style of Coq and friends, then the characteristics of traditionally unsafe languages are not a deal-breaker. You can totally write provably correct and safe code in C, and you don’t need to restrict yourself to a sharply limited version of the language either. You just need to prove that you are doing something sensible each time you perform a potentially unsafe action, such as accessing memory through a pointer.

This slows things down and adds to your burdens of proof, but not really by that much. It’s a few more invariants you need to carry around with you throughout your proofs. Where in a safer language you may have to prove that your Foo list is still sorted after a certain operation, in C you will additionally have to prove that your pointer topology is still what you want after a certain operation. No big deal. In particular, a mature toolkit for proving properties about C programs will presumably have tools for automating away the 99% of trivial proof obligations involving pointer topology, leaving something for you to prove only when you are doing something clever.

For any such property that you can screw up, a language that will not allow you to screw it up in the first place will make your life easier and your proofs simpler, which is why OCaml is more popular as a vehicle for proven correct code than C. But if you are proving the correctness of every aspect of your program anyway, this is a pretty minor gain; the amount of stuff there is to prove about your object-level algorithms will be vastly greater than the requirements added by the lack of safety of your programming language. If only because there will be specialized tools for rapidly dealing with the latter, but not the former.

Not all those specialized tools exist just yet. Currently, the program correctness proof systems are pretty good at proving properties about functions [in the mathematical sense] operating on data structured in a way that mathematicians like to work with, and a bit rubbish at everything else people use software to do; it is no coincidence that Coq, Agda, Idris, Isabelle, and friends all work primarily with purely functional languages using some form of constructed types as a data model. But techniques for dealing with computing applications in a broader sense will have to be on the proof-technology roadmap sooner or later, if correctness proofs are ever going to fulfill their promise outside selected examples. And when they do, there will not be a big difference between programming in C and programming in OCaml as far as proving correctness is concerned.

tl;dr: I don’t think language safety is going to be more than a rounding error if you want to prove the correctness of a large piece of software down to the last bit, once all the techniques for doing that sort of thing at all are in place. The amount of program-specific logic in need of manual analysis is vastly greater than the amount of boilerplate a safe language can avoid.

This is really helpful—thanks!

I think the programming language could be key to a self-improving AI being able to prove that the new implementation achieves the same goals as the old one, as well as us humans being able to prove that the AI is going to do what we expect.

To me it seems like memory safety is price of entry but I expect the eventual language will end up needing to be quite friendly to static analysis and theorem proving. That probably means very restricted side effects and mutation, as well as statically checkable memory and compute limits. Possibly also taking hardware unreliability into account, although I have no idea how to do that.

The language should be easy to write code in — if it’s too hard to write the code, you’re going to be out-competed by unfriendly AI projects — but also easy to write and prove fancy types/static assertions/contracts, because humans are going to be needing to prove a lot of stuff about code in this language and it seems like the proofs should also be in the code. My current vision would be some combination of Coq, Rust and Liquid Haskell.

Anything that you don’t want the AGI to do, you should ideally prove not that it can’t do it, but that it won’t do it. Correctness is enough for this.

In the context of programming languages, “proof” means “machine-checkable proof from mathematical axioms”. While there is difficult work involved in bridging between philosophy and mathematics, a programming language is only going to help on the math side, and on the math side, verifying proofs (once written in programming-language form) is trivial.

I believe Ed Kmett is working on a language while at MIRI. Probably not specifically AI safety focused, he was working on it before he joined. But maybe worth looking into. I’m not sure if there’s much public info about it.

Here’s a public github for coda, the language he’s been working on, with a bit written about it.

Basically all formal proofs assume that the hardware is perfect. Rowhammer.

Yes, an unsafe AI cannot be boxed on unsafe hardware, not can any AI running on physical hardware be made immune to attacks—but those are very different questions. For this question, first we assume that a provably safe AI can be written, then I wanted to ask what language would be needed.