epistemic status: jotting down casually to make sure I have a specific “I Told You So” hyperlink in a year or so
I think 9-10 figures have gone into math automation in 2025, across VC, philanthropy, and a percentage of frontier company expenditure (though if we want to look at the latter, a proper fermstimate would I think get much wider than if you were just counting up all the VC bucks). In the startup case, it looks an awful lot like creating press releases to attract funding and talent, with not a lot of product clarity.
I have been guilty in the past of saying “something something curryhoward” is a good reason to be bullish on these startups pivoting to successful program synthesis companies, but in reality I think there are different versions of this belief, and without nuance we start to blend the “strategic landscape of formal methods in AI security” with “ways to persuade VCs to fund you”. What I have often meant by evoking curryhoward to justify watching the math automation space for AI safety reasons is that the specific capability of shipping lean proofs disproportionately increases SWE capability, reasoning about programs, etc (this goes 10x so if we end up in the “lean is the successor to haskell” future where companies start shipping lean code to production)[1]. Curryhoward, which is a formal-adjacent correspondence between mathematics and algorithms, is actually stronger than that. It really just means that specs/intent are expressible in a specific logic and that the proofs are executable. But in a hoare logic case, which is how we formally verify imperative programs, and influences most schemes I’ve heard of for verifying effectful programs, the proof is not the executable!
In 2026, these math companies will start to make their product plays, and I think we’ll learn a lot about cognition as we test the hypothesis “how generalizable/economically valuable is pure math ability”. However, I think program synthesis companies that do not pursue pure math as a warmup / fundraising strat will take a substantial lead. (this is the core claim of the post that I’m not going to fully justify in this brief memo, just that I wanted it written down in case I can get bayes points from it later). And I roughly agree with Morph’s thesis that an outrageously high quality binary grader (i.e. Lean4) leads to RL gains leads to capability edge such that formal program synthesis companies should outperform other program synthesis companies.
and while a wildly successful program synthesis product is dual use, it would enable a lot of infrastructure hardening so has a good shot at providing a great deal of defensive upside. More posts coming out soon.
I think this only helps with security vulnerabilities (bugs that are not vulnerabilities are not that impactful, and I don’t see this generalizing beyond ordinary software security pre-takeoff). Plausibly after LLMs can rewrite whole codebases in a different language without making them worse, as the next step after that they could also be taught to rewrite them in a much richer dependently typed language in a way that obsessively tracks all security properties that come to mind.
LLMs still can’t rewrite codebases (well enough to matter). So this is probably not yet a 2026 thing, but 2027-2028 is plausible, and it probably mostly helps with the still-mostly-speculative future problem of more effective LLM-enabled hacking.
epistemic status: jotting down casually to make sure I have a specific “I Told You So” hyperlink in a year or so
I think 9-10 figures have gone into math automation in 2025, across VC, philanthropy, and a percentage of frontier company expenditure (though if we want to look at the latter, a proper fermstimate would I think get much wider than if you were just counting up all the VC bucks). In the startup case, it looks an awful lot like creating press releases to attract funding and talent, with not a lot of product clarity.
I have been guilty in the past of saying “something something curryhoward” is a good reason to be bullish on these startups pivoting to successful program synthesis companies, but in reality I think there are different versions of this belief, and without nuance we start to blend the “strategic landscape of formal methods in AI security” with “ways to persuade VCs to fund you”. What I have often meant by evoking curryhoward to justify watching the math automation space for AI safety reasons is that the specific capability of shipping lean proofs disproportionately increases SWE capability, reasoning about programs, etc (this goes 10x so if we end up in the “lean is the successor to haskell” future where companies start shipping lean code to production)[1]. Curryhoward, which is a formal-adjacent correspondence between mathematics and algorithms, is actually stronger than that. It really just means that specs/intent are expressible in a specific logic and that the proofs are executable. But in a hoare logic case, which is how we formally verify imperative programs, and influences most schemes I’ve heard of for verifying effectful programs, the proof is not the executable!
In 2026, these math companies will start to make their product plays, and I think we’ll learn a lot about cognition as we test the hypothesis “how generalizable/economically valuable is pure math ability”. However, I think program synthesis companies that do not pursue pure math as a warmup / fundraising strat will take a substantial lead. (this is the core claim of the post that I’m not going to fully justify in this brief memo, just that I wanted it written down in case I can get bayes points from it later). And I roughly agree with Morph’s thesis that an outrageously high quality binary grader (i.e. Lean4) leads to RL gains leads to capability edge such that formal program synthesis companies should outperform other program synthesis companies.
and while a wildly successful program synthesis product is dual use, it would enable a lot of infrastructure hardening so has a good shot at providing a great deal of defensive upside. More posts coming out soon.
I think this only helps with security vulnerabilities (bugs that are not vulnerabilities are not that impactful, and I don’t see this generalizing beyond ordinary software security pre-takeoff). Plausibly after LLMs can rewrite whole codebases in a different language without making them worse, as the next step after that they could also be taught to rewrite them in a much richer dependently typed language in a way that obsessively tracks all security properties that come to mind.
LLMs still can’t rewrite codebases (well enough to matter). So this is probably not yet a 2026 thing, but 2027-2028 is plausible, and it probably mostly helps with the still-mostly-speculative future problem of more effective LLM-enabled hacking.