arXiv

Hybrid Adversarial Defence for Natural Language Understanding Tasks

Title: A Hybrid Adversarial Defense Strategy for Natural Language Comprehension

Abstract:

Large Language Models (LLMs) face significant risks from both hallucination and adversarial manipulation. While these issues are deeply interconnected, current defensive measures usually treat them as distinct problems. This study introduces a hybrid defense framework that integrates entropy-based models, which aim to mitigate hallucinations, with uncertainty-based and geometric-based models intended to enhance resistance against attacks.

Our evaluation on Natural Language Understanding datasets, including FEVER, HotpotQA, CSQA, and SIQA, reveals that the hybrid approach boosts performance on clean tasks by as much as 43.34%. Furthermore, it significantly strengthens adversarial robustness, achieving up to a 64.92% increase in accuracy and a 62.27% decrease in the attack success rate. When tested on out-of-distribution datasets such as AeroEngQA and CPIQA, the model maintained comparable robustness, showing accuracy improvements of up to 57.14%.

The framework also demonstrated high efficacy against prompt injection (SafeGuard) and jailbreak detection (AdvBench, DAN) scenarios, reducing the attack success rate by up to 51% relative to state-of-the-art baseline models. Collectively, these findings indicate that synthesizing entropy, uncertainty, and geometric features yields a superior defensive strategy compared to relying on any single feature type, proving effective across both in-domain and out-of-distribution contexts.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...