AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms
Title: AlgoVeri: A Unified Benchmark for Verified Code Generation in Classical Algorithms
Abstract:
Vericoding, the process of deriving formally verified code from precise specifications, has demonstrated significant potential through recent advancements in AI models. However, the field currently lacks a cohesive framework for evaluating these models across different programming paradigms. Existing benchmarks are fragmented, focusing on isolated languages or tools such as Dafny, Verus, and Lean, and addressing disparate tasks. This fragmentation renders performance metrics from different studies incomparable. To bridge this divide, we introduce AlgoVeri, a standardized benchmark designed to assess the vericoding of $77$ classical algorithms implemented in Dafny, Verus, and Lean.
By applying uniform functional contracts across these platforms, AlgoVeri highlights substantial deficiencies in current verification systems. Our results indicate that while leading models can achieve manageable success rates in Dafny ($40.3$% for Gemini-3 Flash)—benefiting from high-level abstractions and SMT automation that streamline the verification workflow—performance deteriorates significantly in other environments. In Verus, which imposes systems-level memory constraints, the success rate drops to $24.7$%, and in Lean, where explicit proof construction is mandatory, it falls to just $7.8$%.
Furthermore, our analysis of test-time compute dynamics reveals distinct strategies among top models. Gemini-3 leverages iterative repair mechanisms effectively, tripling its pass rates in Dafny, whereas GPT-OSS reaches performance saturation early on. Finally, our error analysis suggests that language architecture influences how models refine their outputs. Dafny enables models to concentrate on logical accuracy, while Verus and Lean tend to hinder progress by imposing persistent syntactic and semantic obstacles. The complete dataset and evaluation code are publicly available at https://github.com/haoyuzhao123/algoveri.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





