AI security via formal methods
Quinn
Can We Secure AI With Formal Methods? November-December 2025
is there a quick link I can point someone to if they don’t speak Berkelese and I want to say “bayes points”?
I spent a long time spinning my wheels calculating the “scenarios i’d most like to mitigate” and the “1-3 potentially most effective interventions” before noticing that not specializing was slowly killing me. Agree that this is the hard part, but a current guess of mine is that at least some people should do the 80⁄20 by identifying a top 5 then throwing dice.
I have to make a post about specialization, the value it adds to your life independent of the choice of what to specialize in, etc.
Agree that PBT is the control group! Agree that PBT is the 80⁄20 for most people most of the time.
Formal confinment prototype
This worked for me!
Great, thanks. Yeah I dont have a preexisting sequence yet. So this doesnt work for me
I checked for both of those! tho the second time I found the edit profile part of “account settings”. That one is totally on me!
New sequence: again, not seeing it at my profile. I ctrl+f for “sequence” and get nothing.
i’m a bad user but i’m looking for two buttons on my profile and can’t find them.
I would like to edit my user profile
I would like to form a Sequence out of two of my blog posts.
Please Measure Verification Burden
I think probably, yeah. there’s some sticks in the fire about this, watch this space etc.
sorry about the spam from my profile. the automated RSS ingest freaked out when I changed my substack domain and sent all those duplicates.
is there a search feature on lesswrong for like
anna salamon has a sentence about burning man
so
from:@annasalamon burning manwhich greps through any anna salamon post that mentions burning man?it was about the contrast between “fixing electronics in all the dust for your friend’s art installation” and “expanding your mind, maybe with drugs” and how the physics-bound task keeps you from spiraling into insanity.
Social graph density leads to millions of acquaintances and few close friends, because you don’t need to treasure each other
Do you (or any commenter I haven’t read) account for project management gains from claude? I’m imagining that aggregating, summarizing, and ranking bug reports internally at openai and anthropic are worlds different these days than what most PMs and QA departments have historically been like. I think to estimate the “90%” figure properly, these considerations should be a part. Though that may be more holistic than the literal words Dario used in his prediction.
Beliefs about formal methods and AI safety
Part of the implementation choices might be alienating—the day after I signed I saw in the announcement email yesterday “Let’s take our future back from Big Tech.” and maybe a lot of people, who work at large tech companies, who are on the fence don’t like that brand of populism.
July-October 2025 Progress in Guaranteed Safe AI
epistemic status: jotting down casually to make sure I have a specific “I Told You So” hyperlink in a year or so
I think 9-10 figures have gone into math automation in 2025, across VC, philanthropy, and a percentage of frontier company expenditure (though if we want to look at the latter, a proper fermstimate would I think get much wider than if you were just counting up all the VC bucks). In the startup case, it looks an awful lot like creating press releases to attract funding and talent, with not a lot of product clarity.
I have been guilty in the past of saying “something something curryhoward” is a good reason to be bullish on these startups pivoting to successful program synthesis companies, but in reality I think there are different versions of this belief, and without nuance we start to blend the “strategic landscape of formal methods in AI security” with “ways to persuade VCs to fund you”. What I have often meant by evoking curryhoward to justify watching the math automation space for AI safety reasons is that the specific capability of shipping lean proofs disproportionately increases SWE capability, reasoning about programs, etc (this goes 10x so if we end up in the “lean is the successor to haskell” future where companies start shipping lean code to production)[1]. Curryhoward, which is a formal-adjacent correspondence between mathematics and algorithms, is actually stronger than that. It really just means that specs/intent are expressible in a specific logic and that the proofs are executable. But in a hoare logic case, which is how we formally verify imperative programs, and influences most schemes I’ve heard of for verifying effectful programs, the proof is not the executable!
In 2026, these math companies will start to make their product plays, and I think we’ll learn a lot about cognition as we test the hypothesis “how generalizable/economically valuable is pure math ability”. However, I think program synthesis companies that do not pursue pure math as a warmup / fundraising strat will take a substantial lead. (this is the core claim of the post that I’m not going to fully justify in this brief memo, just that I wanted it written down in case I can get bayes points from it later). And I roughly agree with Morph’s thesis that an outrageously high quality binary grader (i.e. Lean4) leads to RL gains leads to capability edge such that formal program synthesis companies should outperform other program synthesis companies.
- ↩︎
and while a wildly successful program synthesis product is dual use, it would enable a lot of infrastructure hardening so has a good shot at providing a great deal of defensive upside. More posts coming out soon.
- ↩︎
to be clear: this strategy is also problematic if you hope to have dependents in the future (i.e. are dating with family-starting intentions). its a selection effect on the partners you can realistically have relationships with.
source: pain and suffering.