arXiv

Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing

Title: Assessing the Resilience of Autoformalization through Semantically Equivalent Paraphrasing

Abstract: Large Language Models (LLMs) have recently established themselves as potent instruments for autoformalization. Nevertheless, despite their strong capabilities, these systems often encounter difficulties in generating formalizations that are both grounded and verifiable. Previous research within the text-to-SQL domain has highlighted that LLMs exhibit sensitivity to paraphrased natural language (NL) inputs, even when the semantic integrity of the original text is largely maintained. This study examines this phenomenon within the context of autoformalization. We specifically assess the robustness of LLMs in producing formal proofs from semantically comparable paraphrased NL statements by evaluating both semantic accuracy and compilation validity. Employing the MiniF2F benchmark and the Lean 4 adaptation of ProofNet, alongside two contemporary LLMs, we generate paraphrased NL statements and conduct cross-evaluations of these inputs across the respective models. Our findings indicate significant performance fluctuations when processing paraphrased inputs, underscoring that slight alterations in NL phrasing can substantially influence model outcomes.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Shark Tank Star Shrinks Data Center Footprint After Backlash
Bloomberg

Shark Tank Star Shrinks Data Center Footprint After Backlash

After public backlash, a Shark Tank entrepreneur reduced the size of a Utah data center project. This decision followed ...

Hatch’s New Bedside Sleep Clock Wirelessly Tracks Sleep Quality
Bloomberg

Hatch’s New Bedside Sleep Clock Wirelessly Tracks Sleep Quality

Hatch’s $250 screen-free sleep clock wirelessly tracks breathing, heart rate, and movement using low-power signals, offe...

Anduril's Stephens on Innovating in an Age of War
Bloomberg

Anduril's Stephens on Innovating in an Age of War

At Bloomberg Tech 2026, Anduril’s Stephens discussed AI’s role in defense and military innovation amid global conflict.

Liftoff Mobile CEO Talks IPO, Advertising and Strategy
Bloomberg

Liftoff Mobile CEO Talks IPO, Advertising and Strategy

Liftoff Mobile’s CEO discusses IPO plans, navigating ad market trends, and outlining the company's strategic direction f...

Samsung Sponsor Spotlight
Bloomberg

Samsung Sponsor Spotlight

The request lacks source text for the "Samsung Sponsor Spotlight" article. Please provide the original content to enable...

AI Isn’t Replacing Credit Hedge Fund Traders Yet, Barclays Says
Bloomberg

AI Isn’t Replacing Credit Hedge Fund Traders Yet, Barclays Says

Barclays states AI hasn’t replaced credit hedge fund traders yet. Human expertise remains vital for complex decisions, m...