Automated Conjecture Resolution with Formal Verification
Title: Automated Conjecture Resolution with Formal Verification
Abstract:
The capabilities of large language models in mathematical reasoning have advanced substantially, progressing from basic problem-solving tasks to tackling complex, research-oriented challenges. Nevertheless, the inherent ambiguity of natural language poses significant hurdles for the reliable resolution and verification of such problems. To address this, we introduce an automated framework that synergizes natural language reasoning with formal verification methods to solve research-level mathematical inquiries. This system comprises two distinct agents: an informal reasoning agent named Rethlas and a formal verification agent called Archon.
Rethlas leverages reasoning primitives alongside our theorem search engine, Matlas, to investigate potential solution strategies and generate candidate proofs. Meanwhile, Archon utilizes LeanSearch to convert informal arguments into fully formalized Lean 4 projects. Through a process involving task decomposition, iterative refinement, and automated proof synthesis, Archon guarantees that the resulting proofs are machine-checkable for correctness.
By employing this framework, we successfully resolved an open problem in commutative algebra and formally verified the proof in Lean 4 with minimal human intervention. Supplementary case studies highlight Rethlas’s proficiency in informal mathematical reasoning and discovery, as well as Archon’s capacity to formalize research-level proofs within Lean 4. Our experimental results indicate that robust theorem retrieval tools facilitate the discovery and application of mathematical techniques across different domains, while the formal agent can independently bridge nontrivial gaps in informal arguments. More broadly, this study presents a viable paradigm for mathematical research, demonstrating how informal and formal reasoning systems, augmented by theorem retrieval tools, can work together to generate verifiable outcomes, decrease human workload, and foster human-AI collaborative research.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





