AtomEval: Validity-Aware Atomic Evaluation of Adversarial Claim Rewriting in Fact Verification
Title: AtomEval: A Validity-Aware Framework for Assessing Adversarial Claim Rewriting in Fact Verification
Abstract:
While large language models (LLMs) possess the capability to modify refuted claims in ways that bypass evidence-based fact verification systems, traditional metrics for measuring attack success rate (ASR) often yield misleadingly high results. This inflation occurs because standard evaluations fail to distinguish between successful evasion and rewrites that inadvertently alter, dilute, or rectify the false premise they were intended to maintain. To address this limitation, we present AtomEval, a validity-aware evaluation protocol designed specifically for adversarial claim rewriting under fixed-evidence conditions.
AtomEval operates by decomposing claims into subject–relation–object–modifier (SROM) atoms. It employs a one-way preservation gate to filter out rewrites that change the underlying proposition, thereby isolating genuine verifier evasion attempts. The protocol then calculates a validity-aware attack success rate (VASR), a metric that exclusively counts rewrites that both evade the verifier and retain the original false proposition. Additionally, AtomEval offers granular diagnostic insights, clarifying instances where proposition preservation fails and identifying rewrites that are valid but not minimal.
When applied to the task of rewriting refuted claims within the FEVER dataset, AtomEval reveals significant ASR inflation, demonstrating that many purported attacks succeed simply by modifying, weakening, or correcting the target proposition rather than preserving it. By rendering the preservation of the attacked proposition both explicit and quantifiable, AtomEval establishes a robust and stable benchmark for assessing adversarial rewriters, ensuring a proper balance between evading verification systems and maintaining the integrity of the original false claim.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





