Research

Paper

TESTING February 19, 2026

M2F: Automated Formalization of Mathematical Literature at Scale

Authors

Zichen Wang, Wanli Ma, Zhenyu Ming, Gong Zhang, Kun Yuan, Zaiwen Wen

Abstract

Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring that entire projects compile end-to-end. We present M2F (Math-to-Formal), the first agentic framework for end-to-end, project-scale autoformalization in Lean. The framework operates in two stages. The statement compilation stage splits the document into atomic blocks, orders them via inferred dependencies, and repairs declaration skeletons until the project compiles, allowing placeholders in proofs. The proof repair stage closes these holes under fixed signatures using goal-conditioned local edits. Throughout both stages, M2F keeps the verifier in the loop, committing edits only when toolchain feedback confirms improvement. In approximately three weeks, M2F converts long-form mathematical sources into a project-scale Lean library of 153,853 lines from 479 pages textbooks on real analysis and convex analysis, fully formalized as Lean declarations with accompanying proofs. This represents textbook-scale formalization at a pace that would typically require months or years of expert effort. On FATE-H, we achieve $96\%$ proof success (vs.\ $80\%$ for a strong baseline). Together, these results demonstrate that practical, large-scale automated formalization of mathematical literature is within reach. The full generated Lean code from our runs is available at https://github.com/optsuite/ReasBook.git.

Metadata

arXiv ID: 2602.17016
Provider: ARXIV
Primary Category: cs.AI
Published: 2026-02-19
Fetched: 2026-02-21 18:51

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.17016v1</id>\n    <title>M2F: Automated Formalization of Mathematical Literature at Scale</title>\n    <updated>2026-02-19T02:25:23Z</updated>\n    <link href='https://arxiv.org/abs/2602.17016v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.17016v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring that entire projects compile end-to-end. We present M2F (Math-to-Formal), the first agentic framework for end-to-end, project-scale autoformalization in Lean. The framework operates in two stages. The statement compilation stage splits the document into atomic blocks, orders them via inferred dependencies, and repairs declaration skeletons until the project compiles, allowing placeholders in proofs. The proof repair stage closes these holes under fixed signatures using goal-conditioned local edits. Throughout both stages, M2F keeps the verifier in the loop, committing edits only when toolchain feedback confirms improvement. In approximately three weeks, M2F converts long-form mathematical sources into a project-scale Lean library of 153,853 lines from 479 pages textbooks on real analysis and convex analysis, fully formalized as Lean declarations with accompanying proofs. This represents textbook-scale formalization at a pace that would typically require months or years of expert effort. On FATE-H, we achieve $96\\%$ proof success (vs.\\ $80\\%$ for a strong baseline). Together, these results demonstrate that practical, large-scale automated formalization of mathematical literature is within reach. The full generated Lean code from our runs is available at https://github.com/optsuite/ReasBook.git.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-19T02:25:23Z</published>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Zichen Wang</name>\n    </author>\n    <author>\n      <name>Wanli Ma</name>\n    </author>\n    <author>\n      <name>Zhenyu Ming</name>\n    </author>\n    <author>\n      <name>Gong Zhang</name>\n    </author>\n    <author>\n      <name>Kun Yuan</name>\n    </author>\n    <author>\n      <name>Zaiwen Wen</name>\n    </author>\n  </entry>"
}