Peter, thanks again for starting this discussion! Just a few caveats in your summary. We don’t depend on trustable AIs! One of the absolutely amazing and critical characteristics of mathematical proof is that anybody, trusted or not, can create proofs and we can check them without any need to trust the creator. For example, MetaMath https://us.metamath.org/ defines an especially simple system for which somebody has written a 350 line Python proof checker which can check all 40,000 MetaMath theorems in a few seconds. We need to make sure that that small Python program is correct but beyond that don’t need to trust any AIs! It certainly would be nice to have trustable AIs but I think it is a mistake to depend on them. We are already seeing a wide range of open source AIs created by different groups with different intentions. And there are many groups working to remove the safety features of commercial Large Language Models and text-to-image models.
The core of our proposal is to put provable guardrails around dangerous technologies that AGI might seek to control. As many commenters have pointed out we need to figure out what those are! Some current examples include DNA synthesis machines, nukes, military hardware of all kinds, drones capable of dispersing dangerous payloads, etc. And we need to create rules for which actions with these systems are safe! But we already do that today for human use. The CDC has detailed rules for biohazard labs. The initial implementation of our proposal will most likely be digitally checkable versions of those existing human rules. But certainly we need to extend them to the new environment as AIs become more powerful. Any action remotely capable of human extinction should become a primary focus. I believe establishing the rules for high risk actions should be an immediate priority for humanity. And extending those rules to other systems to enable human flourishing should be a close second.
We can use untrusted AIs in much of this work! We simply require them to generate formal proofs of any designs or software that they create. These can then be easily and rigorously checked and they can be used without concern for the properties of the AI which generated them.
We simply require them to generate formal proofs of any designs or software that they create. These can then be easily and rigorously checked and they can be used without concern for the properties of the AI which generated them.
That would be neat! I’m overall quite excited about this approach, even though there are quite a few details to iron out. My main skepticism (as harfe pointed out before) is indeed how to specify the things we care about in a formal format which can then be formally verified. Do you know of any ongoing (or past) efforts which try to convert natural language specifications into formal ones?
I’ve heard of formal verification efforts in NASA where they gather a bunch of domain experts who, using a standardized template, write down the safety specifications of a spacecraft. Then, formal methods researchers invented a logic which was expressive enough to encode these specifications and formally verified the specifications.
Peter, thanks again for starting this discussion! Just a few caveats in your summary. We don’t depend on trustable AIs! One of the absolutely amazing and critical characteristics of mathematical proof is that anybody, trusted or not, can create proofs and we can check them without any need to trust the creator. For example, MetaMath https://us.metamath.org/ defines an especially simple system for which somebody has written a 350 line Python proof checker which can check all 40,000 MetaMath theorems in a few seconds. We need to make sure that that small Python program is correct but beyond that don’t need to trust any AIs! It certainly would be nice to have trustable AIs but I think it is a mistake to depend on them. We are already seeing a wide range of open source AIs created by different groups with different intentions. And there are many groups working to remove the safety features of commercial Large Language Models and text-to-image models.
The core of our proposal is to put provable guardrails around dangerous technologies that AGI might seek to control. As many commenters have pointed out we need to figure out what those are! Some current examples include DNA synthesis machines, nukes, military hardware of all kinds, drones capable of dispersing dangerous payloads, etc. And we need to create rules for which actions with these systems are safe! But we already do that today for human use. The CDC has detailed rules for biohazard labs. The initial implementation of our proposal will most likely be digitally checkable versions of those existing human rules. But certainly we need to extend them to the new environment as AIs become more powerful. Any action remotely capable of human extinction should become a primary focus. I believe establishing the rules for high risk actions should be an immediate priority for humanity. And extending those rules to other systems to enable human flourishing should be a close second.
We can use untrusted AIs in much of this work! We simply require them to generate formal proofs of any designs or software that they create. These can then be easily and rigorously checked and they can be used without concern for the properties of the AI which generated them.
I understand how we can avoid trusting an AI if we’ve got a specification that the proof checker understands.
Where I expect to need an AI is for generating the right specifications.
That would be neat! I’m overall quite excited about this approach, even though there are quite a few details to iron out. My main skepticism (as harfe pointed out before) is indeed how to specify the things we care about in a formal format which can then be formally verified. Do you know of any ongoing (or past) efforts which try to convert natural language specifications into formal ones?
I’ve heard of formal verification efforts in NASA where they gather a bunch of domain experts who, using a standardized template, write down the safety specifications of a spacecraft. Then, formal methods researchers invented a logic which was expressive enough to encode these specifications and formally verified the specifications.