Lean-GAP: A Dataset of Formalized Graduate Algebra Problems
Title: Lean-GAP: A Collection of Formalized Graduate Algebra Exercises
Abstract:
This paper introduces Lean-GAP (Lean-Graduate Algebra Problems), a new dataset comprising 430 graduate-level algebra problems formalized from the textbook Abstract Algebra by Dummit and Foote. We outline a scalable workflow that encompasses PDF-to-LaTeX preprocessing, autoformalization into Lean 4, and the verification of alignment between informal and formal representations. Although the initial preprocessing and autoformalization phases can be largely automated, our findings indicate that verification is the most intricate and labor-intensive stage, necessitating rigorous human oversight.
Our primary contributions are threefold: (i) the creation of a structured repository of formalized exercises; (ii) the establishment of a systematic methodology for formalizing mathematical content from textbooks; and (iii) a detailed examination of persistent challenges encountered during formalization. Additionally, we evaluate the performance of various autoformalization models and identify critical bottlenecks in the translation of informal mathematical statements into formal languages.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



