arXiv

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms

Title: AlgoVeri: A Unified Benchmark for Verified Code Generation in Classical Algorithms

Abstract:

Vericoding, the process of deriving formally verified code from precise specifications, has demonstrated significant potential through recent advancements in AI models. However, the field currently lacks a cohesive framework for evaluating these models across different programming paradigms. Existing benchmarks are fragmented, focusing on isolated languages or tools such as Dafny, Verus, and Lean, and addressing disparate tasks. This fragmentation renders performance metrics from different studies incomparable. To bridge this divide, we introduce AlgoVeri, a standardized benchmark designed to assess the vericoding of $77$ classical algorithms implemented in Dafny, Verus, and Lean.

By applying uniform functional contracts across these platforms, AlgoVeri highlights substantial deficiencies in current verification systems. Our results indicate that while leading models can achieve manageable success rates in Dafny ($40.3$% for Gemini-3 Flash)—benefiting from high-level abstractions and SMT automation that streamline the verification workflow—performance deteriorates significantly in other environments. In Verus, which imposes systems-level memory constraints, the success rate drops to $24.7$%, and in Lean, where explicit proof construction is mandatory, it falls to just $7.8$%.

Furthermore, our analysis of test-time compute dynamics reveals distinct strategies among top models. Gemini-3 leverages iterative repair mechanisms effectively, tripling its pass rates in Dafny, whereas GPT-OSS reaches performance saturation early on. Finally, our error analysis suggests that language architecture influences how models refine their outputs. Dafny enables models to concentrate on logical accuracy, while Verus and Lean tend to hinder progress by imposing persistent syntactic and semantic obstacles. The complete dataset and evaluation code are publicly available at https://github.com/haoyuzhao123/algoveri.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...