Petropolitan comments on Implications of the inference scaling paradigm for AI safety

Petropolitan 26 Mar 2025 17:04 UTC
6 points
0
Two months later I tried to try actually implementing a nontrivial conversion of a natural language mathematical argument to a fully formalized Lean proof in order to check if I was indeed underestimating it (TBH, I have never tried a proof assistant before).
So I took a difficult integral from a recent MathSE question I couldn’t solve analytically myself, had Gemini 2.5 Pro solve it 0-shot,^[1] verified it numerically, set up a Lean environment in Google Colab and then asked if another instance of Gemini 2.5 could convert the solution into a proof. It told me that it is indeed hard:
This is not a trivial task. Here’s why:
- Informal vs. Formal: My natural language explanation, while aiming for clarity, likely skipped steps, relied on implicit assumptions (like function continuity, differentiability, domain constraints), or used intuitive leaps that Lean demands be made explicit and rigorously justified using defined theorems and axioms.
- Library Navigation: Finding the exact theorems in mathlib4 that correspond to each step (e.g., the correct version of integration by parts, substitution, limit theorems, properties of specific functions) requires familiarity with the library.
- Side Conditions: Every theorem in Lean (like the chain rule or integration by parts) has precise side conditions (e.g., f is differentiable, g’ is continuous, the function is integrable). The natural language proof might not have explicitly stated or verified all of these, but the Lean proof must.
- Calculations: Even seemingly simple algebraic manipulations or derivative calculations need to be carried out using Lean’s tactics (ring, linarith, simp, rw, etc.) or proven step-by-step.
- Proof Structure & Tactics: Structuring the proof correctly in Lean and using the appropriate tactics (apply, exact, calc, by, etc.) to guide the prover is a skill in itself.
- My Limitations: While I can generate code snippets, generating a complete, correct, and non-trivial formal proof interactively is currently beyond my capabilities. It often requires a human expert to guide the process, debug errors, and find the right lemmas.
<...>
It is highly unlikely that I can produce a complete, automatically verifiable Lean proof for a “tricky” integral directly from a natural language description. However, if you provide the details, I can attempt to sketch out the Lean concepts involved, which could be a starting point for someone (perhaps you in Colab, with time and learning) to build the actual proof.
Gemini and I weren’t able to set up mathlib4 in Lean 4 and I gave up on the task, but already by just looking on a solution Gemini listed the following problems^[2] (I put it here as a screen capture instead of a proper collapsible section because I couldn’t figure out how to copypaste the formulas right):
To sum up, yes, I did underestimate the hardness of the task, it is certainly beyond the reach of current SOTA LLMs.
However, I believe that since this type of task is verifiable in silico and really very convenient for synthetic training data generation, Google folks behind AlphaGeometry are probably going to solve this problem in a year or two.
1. ^
  The fact that an LLM solved it 0-shot is notable in its own right BTW. Generally, I’ld estimate that Gemini 2.5 and o3-mini are able to solve most of the definite integrals posted in MathSE questions. It was very different at the beginning of this year!
2. ^
  I haven’t checked accuracy of all the generated details due to lack of competence and time but generally expect the outline to be broadly correct