Steve_Omohundro

Karma: 318

Steve_Omohundro 1 Feb 2025 19:04 UTC
10 points
3
on: In response to critiques of Guaranteed Safe AI
Thanks Nora for an excellent response! With R1 and o3, I think we are rapidly moving to the point where AI theorem proving, autoformalization, and verified software and hardware synthesis will be powerful and widely available. I believe it is critically important for many more people to understand the importance of formal methods for AI Safety.
Computer science, scientific computing, cryptography, and other fields have repeatedly had “formal methods skeptics” who argue for “sloppy engineering” rather than formal verification. This has been a challenge ever since Turing proved the first properties of programs back in 1949. This hit a tragic peak in 1979 when three leading computer scientists wrote a CACM paper arguing against formal methods in computer science stating that “It is felt that ease of formal verification should not dominate program language design” and set back computer science by decades: “Social processes and proofs of theorems and programs” https://dl.acm.org/doi/10.1145/359104.359106 And we have seen the disastrous consequences of that attitude over the past 45 years. Here’s a 2002 comment on those wars: “The Proof of Correctness Wars” https://cacm.acm.org/practice/the-proof-of-correctness-wars/
I only just saw the “Formal Methods are Overrated” video and tried to post a response both on YouTube and LinkedIn but those platforms don’t allow comments of more than a paragraph. So let me post it here as an addition to your fine post:
I am a co-author on two of the papers you mentioned (“Toward Guaranteed Safe AI” https://arxiv.org/abs/2405.06624 and “Provably Safe AI” https://arxiv.org/abs/2309.01933 ). Formal verification is critically important for AI safety, but with a reversed implication from what you described. It’s not that formal verification suddenly makes safety and security problems easy. It’s that if a complex system *doesn’t* have a formal proof of safety properties, then it is very likely to be unsafe. And as AI systems become more powerful, they are likely to find every safety or security vulnerability and exploit it. This point was clearly explained in the paper with Max. Contrary to your video, formal methods are *underrated* and *any* approach to AI safety and security which is effective against powerful AIs must use it. For some reason, ever since Turing’s 1949 demonstration of proofs of program properties, so-called “experts” have opted for “sloppy” design rather than creating verified designs using humanity’s most powerful tool: mathematics.
Humaniy’s unwillingness to step up and create provably correct technology is why we are in the sorry state we’re in with incredibly buggy and insecure infrastructure. Fortunately, new AI like OpenAI’s o3 model appear to be super-human in mathematics and perhaps autoformalization, theorem proving, and verified software synthesis and may usher in a new era of verified software, hardware, and social protocols.
The points you made don’t affect the importance of formal proof for AI safety. Your first point was that “weights are intractable” in complex neural nets. There is a fair amount of work on formally verifying properties of simpler neural nets. For example, the book “Introduction to Neural Network Verification” https://arxiv.org/abs/2109.10317 has a nice treatment of finding provable intervals, zonotopes, and polyhedra bounding the behavior of networks. But for the systems of current interest, like transformers, I agree that this approach isn’t likely to be practical. That’s why in those two papers we focus on “gatekeepers” which sit between complex neural systems and the actions they take in the world. This VAISU talk I gave summarizes the approach: “Provably Safe AI – Steve Omohundro” https://www.youtube.com/watch?v=7ON6QSnHCm4&ab_channel=HorizonEvents The complicated LLM or other AI system can do whatever computations it wants but it must generate both a solution to a problem and a proof that that solution meets the spec. If it is controlling a physical system, that spec would include safety criteria that the solution must meet. The gatekeeper checks the proof (a simple and cheap operation) and only allows the solution to affect the world if the proof is verified.

Your second point is that “reality is complicated + unknown”. Yes, but here we are talking about *designing* systems in which we have high confidence of safety. Every engineering discipline already deals with the complexity of the world and has developed techniques and models which simplify the problems to the point where humans can solve them. Semiconductor specialists create design rules which provide strong guarantees that their chips behave as desired. For security, we need a model of the adversary. If the adversary can only attack through the designed digital channels, we only need formal proofs of security at the software level. Physical attacks require models of the physical device and proofs of security against a class of adversary which goes beyond today’s common practice. You mentioned our example of DNA synthesis machines. We certainly weren’t proposing some kind of automatic system to determine which DNA sequences are safe for humans. Rather, we assume a “whitelist” of allowed safe sequences (eg. determined by a bio safety board). The safety challenge was the much simpler task of ensuring that rogue AIs cannot synthesize unsafe sequences which are not on the whitelist. Today’s DNA synthesis machines allow anybody to synthesize anything. We argue that as AI becomes extremely powerful we need a “gatekeeper” between the AI and the synthesis machine which checks for allowed sequences. How to do that in a way which can’t be physically subverted is described in my video above.

Finally, you argue that “‘tool AI’ is unstable + uncompetitive”. What Max Tegmark and others are saying here is that we don’t need AI systems with uncontrolled *agency*. That is, they don’t need their own goals and the ability to take arbitrary actions in the world to achieve those goals. Any problem that humans want solved, can be just as well solved by a powerful AI which only accepts formal problem specs and provides solutions and formal proofs of those solutions. There is no need to eliminate machine learning! But we absolutely must prevent AIs from running rampant over the human infrastructure. You suggest that it would be “unstable” in the sense that it’s trivial to convert a powerful tool into a powerful agent. My talks and writings are exactly about that and formal methods are a critical aspect. It’s a simpler problem if powerful AIs require trillion dollar data centers which can be constrained by government regulation. Unfortunately, recent results like DeepSeek’s R1 or Microsoft’s “rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking” https://arxiv.org/abs/2501.04519 suggest that even tiny models running on cell phones may be powerful enough to cause harm. We need to create a trustable infrastructure which cannot be subverted by powerful AI using the techniques I describe in my talks. We are in a race between using AI for capabilities and using it for safety. Formal methods are a critical tool for creating the critical trustable infrastructure.

Steve_Omohundro 15 Sep 2024 23:43 UTC
1 point
0
on: Proveably Safe Self Driving Cars
David, I really like that your description discusses the multiple interacting levels involved in self-driving cars (eg. software, hardware, road rules, laws, etc.). Actual safety requires reasoning about those interacting levels and ensuring that the system as a whole doesn’t have vulnerabilities. Malicious humans and AIs are likely to try to exploit systems in unfamiliar ways. For example, here are 10 potential harmful actions (out of many many more possibilities!) that AI-enabled malicious humans or AIs could gain benefits from which involve controlling self-driving cars in harmful ways:
1) Murder for hire: Cause accidents that kill a car’s occupant, another car’s occupant, or pedestrians
2) Terror for hire: Cause accidents that injure groups of pedestrians
3) Terror for hire: Control vehicles to deliver explosives or disperse pathogens
4) Extortion: Demand payment to not destroy a car, harm the driver, or harm others
5) Steal delivery contents: Take over delivery vehicles to steal their contents
6) Steal car components: Take over cars to steal components like catalytic converters
7) Surreptitious delivery or taxis: Control vehicles to earn money for illegal deliveries or use in illegal activities
8) Insurance fraud for hire: Cause damage to property for insurance fraud
9) Societal distraction for hire: Cause multiple crashes to overwhelm the societal response
10) Blocked roads for hire: Control multiple vehicles to block roadways for social harm or extortion
To prevent these and many related harms, safety analyses must integrate analyses on multiple levels. We need formal models of the relevant dynamics of each level and proof of adherence to overall safety criteria. For societal trust in the actual safety of a self-driving vehicle, I believe manufacturers will need to provide machine-checkable “certificates” of these analyses.

Steve_Omohundro 15 Sep 2024 17:34 UTC
5 points
0
on: Proveably Safe Self Driving Cars
Hear Hear! Beautifully written! Safety is a systems issue and if components at any level have vulnerabilities, they can spread to other parts of the system. As you write, we already have effective techniques for formal verification at many of the lower levels: software, OS, chips, etc. And, as you say, the AI level and social levels are more challenging. We don’t yet have a detailed understanding of transformer-based controllers and methods like RLHF diminish but don’t eliminate harmful behaviors. But if we can choose precise coarse safety criteria, then formal gatekeepers between the AI and the actuators can ensure those constraints are obeyed.
At the social level, you mention legality and violation of laws. Those work well for humans because humans care about being fined or jailed. I think the AI situation is more challenging because it may be unclear what the source of an AI involved in a harmful event was. If a malicious human actor or AI can spin off harmful AI agents anonymously, then after-the-fact punishment doesn’t serve as a deterrent. That suggests that we need to both create accountability for the provenance of AIs and to harden infrastructure so that even malicious AIs can’t cause harmful outcomes. I believe that both of those can be accomplished using formally verified techniques like Max’s and my “provable contracts”.
But there is clearly a lot of work to be done! What are the next steps and how long do we have? I’ve been waiting for LLMs to reach proficiency at theorem proving, verified software synthesis, and autoformalization. Companies like DeepSeek (https://github.com/deepseek-ai/DeepSeek-Prover-V1.5 ), Harmonic (https://harmonic.fun/news ) and xAI (https://x.ai/blog ) are making rapid progress on each of these capabilities. But OpenAI’s release of o1 this week may be the enabling event I’ve been waiting for. For example, check out mathematician Terence Tao’s experience with it: https://mathstodon.xyz/@tao/113132502735585408 “I could imagine a model of this capability that was specifically finetuned on Lean and Mathlib, and integrated into an IDE, being extremely useful in formalization projects.”
Of course, those new abilities also enable new capability levels for cyberattack and social manipulation. So the urgency for hardening today’s infrastructure may happen earlier than many of us expected. That suggests that we may need to rapidly institute coarse protective measures before we can build out the truly human-beneficial infrastructure we ultimately want. But the sooner we can build precise models of our systems, their failure modes, and the full range of adversarial attacks, the less harm humanity will have to suffer.

Steve_Omohundro 8 Sep 2024 16:33 UTC
8 points
1
in reply to: Davidmanheim’s comment on: Limitations on Formal Verification for AI Safety
People seem to be getting the implication we intended backwards. We’re certainly not saying “For any random safety property you might want, you can use formal methods to magically find the rules and guarantee them!” What we are saying is “If you have a system which actually guarantees a safety property, then there is formal proof of that and there are many many benefits to making that explicit, if it is practical.” We’re not proposing any new physics, any new mathematics, or even any new AI capabilities other than further development of current capabilities in theorem proving, verified software synthesis, and autoformalization.
Humanity is in a crisis right now! Even without AI we have disasters like the $5 billion CrowdStrike flaw a few weeks ago, many many cyberattacks disabling critical systems, Boeing airplanes falling apart in the sky, etc. As open source AI cyberattack models advance, every one of today’s flaws and security holes are likely to be exploited by a wide variety of malicious actors. We have clear descriptions of desired safety properties for many of today’s issues and we have design rules which have been developed to mitgate the problems. But today’s software development and engineering practices aren’t leading to the safety we need! Software engineers wing it and introduce flaws in their systems and security holes are left unfixed for decades. Boeing outsources the manufacture of jet components to underpaid and unsupervised third parties.
By creating explicit formal representations of these rules, and formal “digital twins” of our systems, we can obtain guarantees that they are being followed. Our papers and my talks describe many techniques for doing this. But we are just scratching the surface. We need many many more people thinking about how to achieve actual safety in the face of today’s and tomorrow’s AIs.
I’m a little bit shocked at the low level of curiosity and creativity I’m seeing in these discussions. I have not seen any other proposals for actually dealing with the actual problems that real AI systems are likely to cause in the next few years. Attempts at alignment, red teaming, etc. are all very interesting and valuable but do nothing to address the hundreds of millions of open source AIs which are almost as powerful as the frontier closed models and which are easily fine tuned for cyberattack and other harmful actions.
The general approach we have outlined is not optional! Godel’s completeness theorem tells us that there is a formal proof of any safety property which actually holds for a system. For complex systems, they are very unlikely to be accidentally safe. So designing and building them with explicit formal models will be necessary. If humans don’t do it, then powerful AIs certainly will. But, in that case, we may have very little input into exactly what properties will be guaranteed.
Our proposal is not simple! But it doesn’t rely on any exotic new ideas. It does require creativity and dedication to actually work to ensure that humanity survives the next decade. There would be tremendous benefit in simply precisely encoding today’s safety criteria and in creating systems that guarantee that those criteria are followed. With some future creativity we can go further and use these techniques to develop much more accurate safety criteria and better designs. I would have expected huge interest and forward movement on this. I’m sad to not see much of that so far.

Steve_Omohundro 28 Aug 2024 20:39 UTC
3 points
2
in reply to: Steve_Omohundro’s comment on: Limitations on Formal Verification for AI Safety
Thinking more about the question of are there properties which we believe but for which we have no proof. And what do we do about those today and in an intended provable future?
I know of a few examples, especially in cryptography. One of the great successes of theoretical cryptography was the reduction the security of a whole bunch of cryptographic constructs to a single one: the existence of one way functions which are cheap to compute but expensive to invert: https://en.wikipedia.org/wiki/One-way_function That has been a great unifying discovery there and the way the cryptographers deal with it is that they just add the existence of one way functions as an extra axioms to their formal system! It does mean that if it turns out not to be true, then a lot of their proven secure system may actually not be. Fortunately, there is “Information-Theoretic Cryptography” https://www.cambridge.org/us/universitypress/subjects/engineering/communications-and-signal-processing/information-theoretic-cryptography?format=HB which doesn’t rely on any unproven assumptions but is somewhat more inconvenient to use.
Then there’s “Public Key Cryptography” which rests on much dicier assumptions (such as factoring being hard). We already know that a bunch of those assumptions no longer hold in quantum computation but NIST recently announced 3 “post-quantum” standards https://csrc.nist.gov/projects/post-quantum-cryptography but I think there is still quite a lot of worry about them
More generally, mathematicians often have statements which they believe to be true (eg. P!=NP, the Riemann hypothesis, and others: https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_mathematics On what evidence do mathematicians believe these statements? What happens if they use them in proofs of other statements?
Timothy Gowers wrote an insightful essay into these questions: “What Makes Mathematicians Believe Unproved Mathematical Statements?” https://www.semanticscholar.org/paper/What-Makes-Mathematicians-Believe-Unproved-Gowers/b17901cece820de845e57456eda06f892b5ba199
What does this mean for provable safety? One can always add any beliefs one likes to one’s axioms and prove things from there! If the axioms you have added turn out to be false, that can undo the guarantees about anything which depended on them. That suggests that one should try to limit unproven assumptions as much as possible! And it also seems to be a great strategy to follow the cryptographers and try to capture the essence of an assumption in a “standard” assumption whose validity can be tested in a variety of contexts.
Physics aims to have a precise fundamental theory and to mathematically derive the consequences of that theory to explain all phenomena. From a mathematical point of view, physicists are notoriously “mathematically sloppy” using techniques which may often give the right answer but which may not be provable (eg. different perturbation methods, renormalization, path integrals, etc.) But, fortunately, more mathematically inclinded physicists have repeatedly come along afterwards and created precise formal models within which the physics derivations are justified.
Two big leaps in the tower of physics models are from deterministic quantum equations (eg. Schrodinger’s equation) to statistical representations of measurements and from particle descriptions of matter (eg. an ideal gas) to statistical mechanics representations based on probabilities. Huge literatures explore the meaning and character of those two leaps but my sense is that in general we can’t yet formally justify them, but that they are extremely well justified empirically. Physicists call these the assumptions of “Quantum decoherence” https://en.wikipedia.org/wiki/Quantum_decoherence and “Stosszahlansatz” or “Molecular chaos” https://en.wikipedia.org/wiki/Molecular_chaos
How do we deal with them formally? I think we have to do what physicists do and just add those assumptions as axioms to the formal models. This is risky in an adversarial context. We have to watch out for powerful adversaries (ie. powerful AIs) which can control matter at a level which enables them to violate these assumptions. Doesn’t seem likely to me, but we must be ever vigilant!
Something I would like to do in these situations but I don’t think we have the philosophical underpinnings for is to have precise provable estimates of the probabilities of these being true or false. Gowers makes some attempt at that but I’m not sure it’s formal yet. It’s a bit weird, it would be a probability for something which is objectively either true or false. So it would be a measure of our knowledge of the state. But it would be valuable for AI safety to have a justifiable measure for how much we need to worry about an adversary being able to violate our assumptions. And, ultimately, our current laws of physics are of this character. It would be great to have a precise measure of our confidence in various physical properties like symmetries (eg. time-invariance, translation invariance, rotational invarianc, etc.), conservation laws (mass/energy, momentum, lepton-number, etc.), etc.

Steve_Omohundro 28 Aug 2024 19:14 UTC
3 points
0
in reply to: Steve Kommrusch’s comment on: Limitations on Formal Verification for AI Safety
Yes, thanks Steve! Very interesting examples! As I understand most chip verification is based on SAT solvers and “Model Checking” https://en.wikipedia.org/wiki/Model_checking . This is a particular proof search technique which can often provide full coverage for circuits. But it has no access to any kind of sophisticated theorems such as those in the Lean mathematics library. For small circuits, that kind of technique is often fine. But as circuits get more complex, it is subject to the “state explosion problem”.
Looking at the floating point division paper, the double precision divider took 7 hours and 30 minutes indicating a lot of search! But one great thing about proofs is that their validity doesn’t depend on how long they take to find or on how big they are.
It looks like they did this verification with the Synopsys VC Formal tools (https://www.synopsys.com/verification/static-and-formal-verification/vc-formal.html ) This looks like a nice toolkit but certainly no higher level mathematical proof. It sounds like it’s perfectly adequate for this task. But I wouldn’t expect it to extend to the whole system very smoothly.
To see what should be possible as AI theorem provers come on line, ask how the Synopsys engineers designed the whole chip to begin with. Presumably they had arguments in their heads about why their design was correct. Humans are especially bad at complex mathematical components like floating point divide, so it makes great sense to use a kind of brute force tool to back up their intuitions there. But with general reasoning capabilities (eg. as people are doing in Lean, Coq, etc.) it shouldn’t be hard to formalize the engineer’s intuitive understanding of why the whole chip is correct.
If the engineers already have a mental proof of correctness, what is the benefit of a formal proof? I believe one of the most important aspects of proof is its social role. The proof in the engineer’s head is not convincing to anyone else. They can try to explain it to another engineer. But that engineer may or may not share the same background and intutions. And if they try to explain it to their manager, the poor manager mostly just has to go on his intuitive assessment of the engineer and on their reputation.
A formal proof eliminates all that need for trust! With a formal proof, anyone can rapidly check it with 100% reliability. The proven design can be incorporated along with other proven designs and the proofs composed to get a proven design of bigger systems. The user of the chip can check the proof and not need to trust the design company. All of these factors make proofs extremely valuable as part of the social interactions of design and deployment. They would be worth creating independent of the increase in correctness and security.
These papers are a great example of why AI theorem proving is likely to be a complete game changer for safety and security but also for general design and process management. NVIDIA’s H100 chip has nearly 13,000 AI-designed circuits https://developer.nvidia.com/blog/designing-arithmetic-circuits-with-deep-reinforcement-learning/ That’s just the tip of the iceberg when AI-driven verification becomes the norm.

Steve_Omohundro 28 Aug 2024 18:35 UTC
5 points
2
in reply to: Steve Kommrusch’s comment on: Limitations on Formal Verification for AI Safety
I agree that would be better than what we usually have now! And is more in the “Swiss Cheese” approach to security. From a practical perspective, we are probably going to have do that for some time: components with provable properties combined in unproven ways. But every aspect which is unproven is a potential vulnerability.
The deeper question is whether there are situations where it has to be that way. Where there is some barrier to modeling the entire system and formally combining correctness and security properties of components to obtain them for the whole system.
Certainly there are hardware and software components whose detailed behavior is computationally complex to predict in advance (eg. searching for solutions to SAT problems or inverting hash functions). So you are unlikely to be able to prove theorems like “For every specification of these n bits in this SAT problem, it will take f(n) time to discover a satisfying value for the remaining bits”. But that’s fine! It’s just that you shouldn’t make the correctness or security of your system depend on that! For example, you might put a time bound on the search and have a failsafe path if it doesn’t succeed by then. That software does have a provable time bound.
So, in general, systems need to be designed to be correct and safe. If you can’t put provable bounds on the safety of a system, then I would argue that you have no business exposing the public to that system.
It would be great to start collecting examples of subcomponents or compositional designs which are especially difficult to prove properties about. My sense is that virtually all of these formal analyses will be done by AIs and not by humans. And I think it will be important to develop libraries and models of problems which are easy to solve formally and provide formal guarantees about. And those which are more difficult.
Thinking about the software verification case, I would argue that every decently written piece of software today, the programmer has an internal argument in their head as to why it is correct and not vulnerable to attacks. Humans are fallible, so their argument may not be correct. But if it is correct, then it shouldn’t be difficult to formalize it into a precise formal proof. The “de Bruijn Factor” (https://www.cs.ru.nl/~freek/factor/factor.pdf ) measures how much bigger a formal proof of something is than an informal description. It seems to be between 4 and 10 in current formal systems. So, if a human programmer has confidence in the correctness and security of his code, it should only be a small factor more work for an AI to formally prove that. If the programmer doesn’t have that confidence, then I think we have no business deploying it anywhere it might harm humans.

Steve_Omohundro 28 Aug 2024 18:14 UTC
9 points
2
in reply to: khafra’s comment on: Limitations on Formal Verification for AI Safety
I totally agree in today’s world! Today, we have management protocols which are aimed at requiring testing and record keeping to ensure that boats and ships in the state we would like them to be. But these rules are subject to corruption and malfeasance (such as the 420 Boeing jets which incorporated defective parts and yet which are currently flying with passengers: https://doctorow.medium.com/https-pluralistic-net-2024-05-01-boeing-boeing-mrsa-2d9ba398bd54 )
But it appears we are rapidly moving to a world in which much of the physical labor will be done by robots and in which each physical system will have a corresponding “digital twin” (eg. https://www.nvidia.com/en-us/omniverse/solutions/digital-twins/ ).
In that world, we can implement provable formal rules governing every system, from raw materials, to manufacture, to supply chain, to operations, and to maintenance.
In an AI world, much more sophisticated malfeasance can occur. Formal models of domains with proofs of adherence to rules and protection against adversaries is the only way to ensure our systems are safe and effective.

Steve_Omohundro 28 Aug 2024 17:53 UTC
5 points
2
in reply to: Steve Kommrusch’s comment on: Limitations on Formal Verification for AI Safety
Testing is great for a first pass! And in non-critical and non-adversarial settings, testing can give you actual probabilistic bounds. If the probability distribution of the actual uses is the same as the testing distribution (or close enough to it), then the test statistics can be used to bound the probability of errors during use. I think that is why formal methods are so rarely used in software: testing is pretty good and if errors show up, you can fix them then. Hardware has greater adoption of formal methods because it’s much more expensive to fix errors after the fact.
But the real problems arise from adversarial attacks. The statistical correctness of a system doesn’t matter to an adversary. They are looking for the weird outlier cases which will enable them to exploit the system (eg. inputs with non-standard characters that break the parser, or super-long inputs which overflow a buffer and enable unexpected access to memory, etc.). Testing can’t show the absence of flaws (unless every input is tested!).
I think the increasing plague of cyberattacks is due to adversaries become more sophisticated in their search for non-standard ways of interacting with systems that expose their untested and unproven underbelly. But that kind of sophisticated attack requires highly skilled attackers and those are fortunately still rare.
What is coming, however, are AI-powered cyberattack systems which know all of the standard flaws and vulnerabilities of systems, all of the published 1-day vulnerabilities, all of the latest social engineering techniques discussed on the dark web, and have full access to reverse engineering tools like Ghidra. Those AIs are likely being developed as we speak in various government labs (eg. here is a list of significant recent cybe incidents: https://www.csis.org/programs/strategic-technologies-program/significant-cyber-incidents ).
How long before powerful cyberattack AIs are available on bittorrent to teenage hackers? So, I believe the new reality is that every system, software and hardware need to be proven correct and secure to have any confidence in it. To do that, we are likely to need to use AI-theorem provers and AI-verified software synthesis systems. Fortunately, many groups are showing rapid progress on those!
But that doesn’t mean testing is useless. It’s very helpful during the development process and in better understanding systems. For final deployment in an environment with powerful AIs, however, I don’t think it’s adequate any more.

Steve_Omohundro 22 Aug 2024 17:34 UTC
10 points
2
in reply to: khafra’s comment on: Limitations on Formal Verification for AI Safety
In general, we can’t prevent physical failures. What we can do is to accurately bound the probability of them occurring, to create designs which limit the damage that they cause, and to limit the ability of adversarial attacks to trigger and exploit them. We’re advocating for humanity’s entire infrastructure to be upgraded with provable technology to put guaranteed bounds on failures at every level and to eliminate the need to trust potentially flawed or corrupt actors.
In the case of the ship, there are both questions about the design of that ship’s components and its provenance. Why did the backup power not enable the propulsion system to stop? Why wasn’t there a “failsafe” anchor which drops if the systems become inoperable? Why didn’t the port have tugboats guiding risky ship departures? What was the history of that ship’s generators? Etc. With the kind of provable technology that Max and I outlined, it is possible to have provably trustable data about the components of the ship, about their manufacture, about their provenance since manufacture, about the maintenance history of the ship’s components, etc.
The author of the main post and other critics argue against formal methods doing complex “magical” things like determining which DNA sequences are safe, how autonomous vehicles should navigate cities, or detecting bad thoughts in huge transformer neural nets. Someday these methods might help with some of those, but those aren’t the low hanging fruit we are proposing. In some sense we mainly want to use proof for much more mundane things. What Max and I are arguing for are mechanisms to create software, hardware, and social designs which aren’t exploitable by adversarial AIs and to create infrastructure that provides guarantees about its state and behavior. Nothing we are proposing requires sophisticated mathematics that today’s grad students couldn’t do. Nothing requires new physics or radically new engineering principles. Rather, it is a way to organize current technologies to increase trust and eliminate vulnerabilities.
These technologies enable us to eliminate the need to trust third parties: Was a computation performed accurately? Were there bugs in the program? What data was used to train this model or estimate this probability? What probabilistic program or neural net was used? Was the training done correctly? What is the history of this component? What evidence is there that it was manufactured correctly? These and thousands more cases will enable us to build up a robust infrastructure which is provably not vulnerable to AI-driven attack.
A core aspect of this is that we can use untrusted powerful AIs running on untrusted datacenters in untrusted countries to help us build completely trusted software, hardware, and social protocols. The idea is to precisely specify a task (eg. software spec, hardware spec, solve a mathematically encoded problem, etc.) and have the untrusted AI generate both and answer and a proof (in a system like Lean) that the answer solves the precisely specified problem or design task. We can cheaply and completely reliably check the proof. If it verifies, then we can fully trust the results from the untrusted AI. This enables us to bootstrap the current mess of untrusted and unreliable AIs, flaky and insecure hardware, untrustable people and groups, etc. to build up a *fully* trustable infrastructure. The power and importance of this is immense!

Steve_Omohundro 22 Aug 2024 17:25 UTC
3 points
6
in reply to: faul_sname’s comment on: Limitations on Formal Verification for AI Safety
I totally agree! I think this technology is likely to be the foundation of many future capabilities as well as safety. What I meant was that society is unlikely to replace today’s insecure and unreliable power grid controllers, train network controllers, satellite networks, phone system, voting machines, etc. until some big event forces that. And that if the community produces comprehensive provable safety design principles, those are more likely to get implemented at that point.

Steve_Omohundro 21 Aug 2024 18:39 UTC
4 points
2
in reply to: habryka’s comment on: Limitations on Formal Verification for AI Safety
Oh, I should have been clearer. In the first part, I was responding to his “rough approximation..[valid] over short periods of time.” claim for formal methods. I was arguing that we can at least copy current methods and in current situations get bridges which actually work rather robustly and for a long time.
And, already, the formal version is better in many respects. For one, we can be sure it is being followed! So we get “correctness” in that domain. A second piece, I think is very important but I haven’t figured out how to communicate it. And that’s that the proof that a design satisfies design rules is a specific artifact that anyone can check. So it completely changes the social structure of the situation. Instead of having to rely on the “expert” who may or may not be competent, and may or may not be corrupt, each party is empowered with absolute guarantee of correctness. I think this alters many social processes dramatically, but it needs to be fleshed out and better explained.
After that argument, I go to the next piece which you mention. Today’s engineering practices are not likely to be robust against powerful adversaries (eg. powerful AI or humans backed up by powerful AI). And I don’t think current practices can deal with that very well. In the AI safety space, the typical approach is “red teaming” where humans try to trigger AIs to produce bad scenarios and they see how easy it is and how powerful the attacks are. This can find problems but can’t show the absence of safety vulnerabilities.
With mathematical proof, we can systematically consider the entire space of possible actions by adversaries in the specified class. Using techniques like “branch and bound”, we can systematically eliminate regions of the action space which are shown to be safe. And if the system is actually safe, there is a proof of that (by Godel’s Completeness theorem which says that any property which holds in all models of a set of axioms can be proven from those axioms). If the systems are complex, the proofs can be large, so there is value in “designing for verification”.
That’s a possibility which provides actual safety against AI and other adversaries and provides detailed information about the value of different features, etc. Several groups are working right now to develop examples of this kind. Hopefully, the process can eventually be automated so that we can put it on the same timescale as AI advancement.

Steve_Omohundro 21 Aug 2024 18:07 UTC
20 points
2
on: Limitations on Formal Verification for AI Safety
Challenge 4: AI advances, including AGI, are not likely to be disruptively helpful for improving formal verification-based models until it’s too late.
Yes, this is our biggest challenge, I think. Right now very few people have experience with formal systems and historically the approach has been met with continuous misunderstanding and outright hostility.
In 1949 Alan Turing laid out the foundations for provably correct software: “An Early Program Proof by Alan Turing”. What should have happened (in my book) is that computer science was recognized as a mathematical science capable of mathematically proving the correctness of its designs. And through the years, there have indeed been many who were inspired by that vision and made great and important contributions.
But, unfortunately, the field was resistant to correctness and we got wave after wave of “sloppy programming” which continues to haunt us to this day. For example, the July 19, 2024 CrowdStrke Incident has been called the largest IT outage in history causing over $10 billion in financial damages. It was caused by a sloppy error in an update to Microsoft Windows security software. This is an outrage and it has been an ongoing outrage almost since the time of Turing.
In another post, I mentioned a similar outrage in scientific computing. It has been known for many decades how to mathematically precisely perform scientific computing. And yet the standard remains “slap the code up against the wall and see if the outputs look reasonable”. It is unknown how many scientific results, medical analyses, or engineering designs are flawed because people couldn’t be bothered to perform their computations correctly.
Cryptography has similar issues. The mainstream is based on computational cryptography which has unknown vulnerability to powerful AI attack. Meanwhile, provable correct “Information Theoretic Cryptography” languishes in a few academic conferences with very little use in practice.
So humanity is way behind where we should be on the formalization front. Our ancestors sloppiness may now be our downfall. One hope, as you mention, is the rapid advancement of AI theorem proving, AI autoformalization, AI verified program synthesis, and other related areas.
It looks to me that we are now in a race of AI for safety vs. AI for unsafe capabilities. The more people who are aware of the issues and the potential solutions using the new formal technologies, the greater the chance we have to survive.
I think the safest world would be one where humanity does not create AI. Next would be where we have a long enough pause in AI development to carefully create safe infrastructure. Next would be restricting powerful AI to a few highly regulated government labs. Next would be a world with tight controls on GPUs and datacenters with no computational overhang. Unfortunately, it appears that all those ships have already sailed. We now have huge computational overhang and powerful open source AI models. The Llama 3.1 8B model can run on a $90 Raspberry Pi 5. So we have to rebuild the world with safe infrastructure in the face of rapidly improving and uncontrolled AI capabilities.
It appears that the massive needed change will only happen *after* large scale AI-powered attacks and destruction begins. I think the greatest contribution to humanity’s survival right now is to create detailed plans for building provably safe infrastructure, so that when the enabling technologies appear and the world begins demanding safe technology, there is a plan for moving forward.

Steve_Omohundro 21 Aug 2024 17:01 UTC
12 points
0
on: Limitations on Formal Verification for AI Safety
The post’s second Challenge is:
Challenge 2 – Most of the AI threats of greatest concern have too much complexity to physically model.
Setting aside for a moment the question of whether we can develop precise rules-based models of physics, GS-based approaches to safety would still need to determine how to formally model the specific AI threats of interest as well. For example, consider the problem of determining whether a given RNA or DNA sequence could cause harm to individuals or to the human species. This is a well-known area of concern in synthetic biology, where experts expect that risks, especially around the synthesis of novel viruses, will dramatically increase as more end-users gain access to powerful AI systems. This threat is specifically discussed in [3] as an area in which the authors believe that formal verification-based approaches can help:
I certainly agree that there are complex situations where we can’t currently tell what’s safe and what isn’t. In that case we should clearly be incredibly conservative and not expose humans to systems with unclear safety properties! The burden is on the creator of a system to demonstrate safety!
Regarding DNA and RNA synthesis, I see the need for provable technologies at a much simpler and lower level than what you are talking about. Right now you can buy DNA and RNA synthesis machines without any kinds of controls on what they will synthesize from many suppliers for just a few thousand dollars. In a world in which the complete sequence of the smallpox DNA and other extremely harmful pathogens, this is completely insane!
It would be nice if we could automatically tell whether a DNA sequence was harmful or not. But, as you say, we are not at the point where AI can do that. But that doesn’t mean we throw up our hands and say “Ok, go ahead and synthesize whatever you want! Here’s the smallpox DNA in case you’re interested!”
Our biohazard labs have strict rules and guidelines for what they can synthesize and how. Human committees must sign off on work that might inadvertently lead to human harm. We need at least that level of control on synthesis devices available in the open market. For example, sequences might need to be cryptographically signed by a governing body stating they are safe before they can be synthesized.
But how can we be sure that the hardware will require those digital signatures? That’s where the “provable contract” hardened crytpographic technologies that Max and I describe comes in. It needs to be impossible for adversaries to use synthesis machines to create unsigned potentially harmful sequences.
There is a whole large story about how that works. Fundamentally it involves secure digital hardware similar to Apple’s Secure Enclave. But with provably effective tamper sensing which implements “zeroization” (deletion of cryptographic keys on detection of tampering). This needs to be integrated into the core operation of the device.
Similar technology needs to be incorporated into robots and other physical devices which can cause harm. The mathematical proof guarantee is that the device will only operate under conditions specified by formal “contract rules”. Those rules can be chosen to be whatever is appropriate for the domain of interest. For biohazard DNA synthesis, it might involve a digital signature from a secure government database of safe sequences or from a regulatory committee overseeing new research.

Steve_Omohundro 21 Aug 2024 16:12 UTC
4 points
0
in reply to: Steve_Omohundro’s comment on: Limitations on Formal Verification for AI Safety
As a concrete example, consider the recent Francis Scott Key Bridge collapse where an out of control container ship struck one of the piers of the bridge. It killed six people, blocked the Port of Baltimore for 11 weeks, and will cost $1.7 billion to replace the bridge which will take four years.
Could the bridge’s designers way back in 1977 have anticipated that a bridge over one of the busiest shipping routes in the United States might some day be impacted by a ship? Could they have designed it to not collapse in this circumstance? Perhaps shore up the base of its piers to absorb the impact of the ship?
This was certainly a tragedy and is believed to have been an accident. But the cause was electrical problems on the ship. “At 1:24 a.m., the ship suffered a “complete blackout” and began to drift out of the shipping channel; a backup generator supported electrical systems but did not provide power to the propulsion system”
How well designed was that ships generator and backup generator? Could adverarial attackers cause this kind of electrical blackout in other ships? Could remote AI systems do it? How many other bridges are vulnerable to impacts from out of control ships? What is the economic value at stake from this kind of flawed safety engineering? How much is it worth to create designs which provide guaranteed protections against this kind of flaw?

Steve_Omohundro 21 Aug 2024 15:46 UTC
13 points
1
on: Limitations on Formal Verification for AI Safety
Here’s the post’s first proposed limitation:
Limitation 1 – We will not obtain strong proofs or formal guarantees about the behavior of AI systems in the physical world. At best we may obtain guarantees about rough approximations of such behavior, over short periods of time.
For many readers with real-world experience working in applied math, the above limitation may seem so obvious they may wonder whether it is worth stating at all. The reasons why it is are twofold. First, researchers advocating for GS methods appear to be specifically arguing for the likelihood of near-term solutions that could somehow overcome Limitation 1.
Think about this claimed limitation in the context of today’s deployed technologies and infrastructure. There are 5 million bridges around the world that people use every day to cross rivers and chasms. Many of them have lasted for hundreds of years and new ones are built with confidence all the time. Failures do happen, but they are so rare that news stories are written about them and videos are shared on YouTube. How are these remarkably successful technologies built?
Mechanical engineers have built up a body of knowledge about building safe bridges. They have precise rules about the structure of successful bridges. They have detailed equations, simulations, and tables of material properties. And they know the regimes of validity of their models. When they build a bridge according to these rules, they have a confidence in their safety which is nothing like a “rough approximation..[valid] over short periods of time.”
We are rapidly moving from human engineers to AI systems for design. What should an AI bridge design system look like? It should certainly encode all the rules that the human engineers use! And, in fact, should give confidence that its designs are actually following those rules, it would be really great to check that a design follows those rules. We could hire a human mechanical engineer to audit the AI design, but we’d really like to automate that process.
Enter formal systems like Lean! We can encode the mechanical engineer’s design criteria and particular bridge designs as precise Lean statements. Now the statement that a particular design follows the design criteria becomes a theorem and Lean can provide a proof certificate that it is true. This certificate can be automatically checked by a proof checker and users can have high confidence that the design does indeed follow the design rules without having to trust anyone!
This formalization process is happening right now in hundreds of different disciplines. It is still new, so early attempts may be incomplete and require engineers to learn to use new tools. But AI is helping here as well. The field of “autoformalization” takes textbooks, manuals, design documents, software, scientific articles, mathematics papers, etc. written in natural language and automatically generates precise formal models of them. Large language models are becoming quite good at this (eg. here’s a recent paper: “Don’t Trust: Verify—Grounding LLM Quantitative Reasoning with Autoformalization”).
As AI autoformalization and theorem proving improves, we should expect the entire corpus of scientific and engineering knowledge to be rapidly represented in a precise formal form. And AI designs which at least meet the current human requirements for safe systems can be automatically generated with proof certificates of adherence to design rules given.
But we cannot stop at the current level of engineering safety. Today’s technologies and infrastructure works pretty well in today’s human environment. Unfortunately, as AIs become more powerful, they will likely be used in an adverarial way to attack infrastructure, both at the behest of malicious human actors and in the process of satisfying autonomous subgoals. To counter this threat, we need a much higher standard of safety than is common in current engineering practice.
Max and I argue that the only way to have assurance of safety against powerful adversarial agents is through formal methods and mathematical proof. For even moderately complex systems, we will need formal guarantees of safety against every attack path that is available to a specified class of adversaries. This is the level of safety for which we need Provably Safe and Guaranteed Safe approaches to achieve.

Steve_Omohundro 20 Aug 2024 23:14 UTC
3 points
0
in reply to: habryka’s comment on: Limitations on Formal Verification for AI Safety
An example of this kind of thing is the “Proton Radius Puzzle” https://physicsworld.com/a/solving-the-proton-puzzle/ https://en.wikipedia.org/wiki/Proton_radius_puzzle in which different measurements and theoretical calculations of the radius of the proton differed by about 4%. The physics world went wild and hundreds of articles were published about it! It seems to have been resolved now, though.

Steve_Omohundro 20 Aug 2024 22:59 UTC
6 points
6
in reply to: pzas’s comment on: Limitations on Formal Verification for AI Safety
Our infrastructure should refuse to do anything the AI asks unless the AI itself provides a proof that it obeys the rules we have set. So we force the intelligent system itself to verify anything it generates!

Steve_Omohundro 20 Aug 2024 22:41 UTC
11 points
2
in reply to: Seth Herd’s comment on: Limitations on Formal Verification for AI Safety
Oh yeah, by their very nature it’s likely to be hard to predict intelligent systems behavior in detail. We can put constraints on them, though, and prove that they operate within those constraints.
Even simple systems like random SAT problems https://en.wikipedia.org/wiki/SAT_solver can have a very rich statistical structure. And the behavior of the solvers can be quite unpredictable.
In some sense, this is the source of unpredictability of cryptographic hash functions. Odet Goldreich proposed an unbelivable simple boolean function which is believed to be one-way: https://link.springer.com/chapter/10.1007/978-3-642-22670-0_10
On the other hand, I think it is often possible to distill behavior for a particlular task from a rich intelligence into simple code with provable properties.

Steve_Omohundro 20 Aug 2024 22:30 UTC
4 points
2
in reply to: Seth Herd’s comment on: Limitations on Formal Verification for AI Safety
I think we definitely would like a potentially unsafe AI to be able to generate control actions, code, hardware, or systems designs together with proofs that those designs meet specified goals. Our trusted systems can then cheaply and reliably check that proof and if it passes, safely use the designs or actions from an untrusted AI. I think that’s a hugely important pattern and it can be extended in all sorts of ways. For example, markets of untrusted agents can still solve problems and take actions that obey desired constraints, etc.
The issue of unaligned AGI hiding itself is potentially huge! I have end state designs that would guarantee peace and abundance for humanity, but they require that all AIs operate under a single proven infrastructure. In the intermediate period between now and then is the highest risk, I think.
And, of course, an adversarial AI will do everything it can to hide and garner resources! One of the great uses of provable hardware is the ability to create controlled privacy. You can have extensive networks of sensors where all parties are convinced by proofs that they won’t transmit information about what they are sensing unless a specified situation is sensed. It looks like that kind of technology might allow mutual treaties which meet all parties needs but prevent the “hidden rogue AIs” buried in the desert. I don’t understand the dynamics very well yet, though.
What links here?
- If we solve alignment, do we die anyway? by Seth Herd (23 Aug 2024 13:13 UTC; 85 points)

Steve_Omohundro

Challenge 4: AI advances, including AGI, are not likely to be disruptively helpful for improving formal verification-based models until it’s too late.

Challenge 2 – Most of the AI threats of greatest concern have too much complexity to physically model.

Limitation 1 – We will not obtain strong proofs or formal guarantees about the behavior of AI systems in the physical world. At best we may obtain guarantees about rough approximations of such behavior, over short periods of time.