HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

Integrating LLMs with Lean prover scaffolding.

May be a graphic of text that says "Hermes Agent M #1 Complete the 100v (۷ 4100 80x 40)2 (x Translation Module repeat K, times autoformalizer model invoke he agent ！ Senanan Lean piler #2 import Mathlib Informal proof step backtranslator theorem test Feedback Module 0xx100*y+4100 （×-40）*2+（y-50）2 sorry self-verify the step Formal Lean4 proof step #3 import Mathlib unverified revise your reasoning no Verified by Compiler? goal theorem test Prover Module True proceed with generation yes H +" True repeat up up K, times Lean4 80*x100xy+4100 )*2+(y-50)2 linarith prover model goal 00 Formal Leand proof code α LT Figure 2: Full Hermes framework with illustrative examples."

Summary provided by Xixidu:

Hermes introduces an architecture where a language model is wrapped around an external, high-reliability verifier: the Lean4 proof assistant.
Instead of just asking the AI “Does this look right?”, it translates key mathematical steps into Lean4 code and asks a proof engine “Can this be formally proved?” If the autoformalization is accurate and Lean finds a proof, this gives a strong mathematical guarantee for that formalized step.
Steps:
1. Reasoning (LLM): The model proposes the next logical step of a proof in natural language.
2. Translation Module: An autoformalizer converts that step into a Lean4 statement. A back-translation check compares the Lean statement to the original text to ensure they match in meaning.
3. Prover Module: A prover, working inside Lean4, attempts to prove the statement (or sometimes its negation). It returns a signal such as “proved”, “disproved” (negation proved), or “not decided / failed”.
4. Feedback & Memory:
- If the step is proved, it is stored in a Memory Block (a database of verified facts) and can be retrieved to support later reasoning.
- If the step is disproved or cannot be justified, the LLM is informed of this outcome and is prompted to revise its reasoning rather than continuing from a shaky premise.
In this way, Hermes interleaves informal LLM reasoning with formal Lean4 checking, using the proof assistant as an external source of ground truth for crucial steps.