Widening the Gap: Exploiting LLM Quantization via Outlier Injection
Title: Expanding the Divide: Leveraging Outlier Injection to Compromise LLM Quantization
Abstract:
The quantization of Large Language Models (LLMs) has emerged as a crucial technique for enabling memory-efficient deployment. However, recent studies have highlighted significant security vulnerabilities inherent in quantization protocols. Specifically, these schemes can be manipulated by adversaries to release models that appear harmless in their full-precision state but reveal malicious functionality once processed by users through quantization. Despite this, previous quantization-conditioned attacks have been constrained to rudimentary quantization methods, relying on the attacker’s ability to identify weight regions that remain stable under the specific target quantization. Consequently, prior attempts have consistently struggled to breach more prevalent and complex quantization frameworks, thereby limiting their real-world applicability.
In this study, we present the first quantization-conditioned attack capable of reliably triggering malicious behavior across a wide array of advanced quantization techniques, including AWQ, GPTQ, and GGUF I-quants. Our approach capitalizes on a common characteristic of modern quantization algorithms: the presence of large outliers can force other weights to round down to zero. By strategically injecting outliers into designated weight blocks, an adversary can engineer a predictable and targeted collapse of the model’s weights. This mechanism allows for the creation of full-precision models that seem innocuous but exhibit diverse malicious actions after quantization.
Through comprehensive evaluations spanning multiple LLMs and three distinct attack scenarios, we demonstrate that our method achieves high success rates against a variety of sophisticated quantization methods that have previously withstood such attacks. These findings mark the first evidence that quantization-related security threats are not confined to basic schemes but are pervasive across complex, widely adopted quantization technologies.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





