LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
Title: LEAP: Enhancing LLMs in Formal Mathematics via Agentic Architectures
Abstract:
While Large Language Models (LLMs) demonstrate proficiency in informal mathematical reasoning, they frequently encounter difficulties in producing mechanically verifiable proofs within formal languages such as Lean. To address this limitation, we introduce LEAP, an agentic framework designed to empower general-purpose foundation models with state-of-the-art capabilities in automated formal theorem proving. LEAP capitalizes on inherent model strengths, including informal reasoning, instruction adherence, and iterative self-refinement. The system facilitates a bridge between informal blueprints and formal proof construction by breaking down intricate problems into manageable components and engaging in continuous interaction with the Lean compiler.
To establish a rigorous evaluation standard beyond increasingly saturated benchmarks, we present Lean-IMO-Bench. This new benchmark consists of IMO-style problems formalized in Lean, featuring concise problem statements that demand highly non-routine, multi-step proofs spanning a broad spectrum of difficulties.
Empirical results highlight LEAP’s exceptional performance. In the 2025 Putnam Competition—a prestigious annual mathematics contest for North American undergraduates—LEAP successfully solved all 12 problems, aligning with recent achievements by leading formal mathematical models. On the Lean-IMO-Bench, LEAP significantly elevates the one-shot formal solve rate of general-purpose LLMs from under 10% to 70%, substantially outperforming the 48% rate established by a specialized, gold-medal-caliber IMO system. Additionally, we showcase LEAP’s capacity for research-level applications by autonomously formalizing complex proofs for open combinatorial problems, notably including a verified proof for a critical subproblem in Knuth’s Hamiltonian decomposition of even-order Cayley graphs.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



