arXiv

Which Defense Closes Which Threat? Attributing OWASP-LLM-Top-10 Coverage and Its Brittleness Under Paraphrasing

Title: Mapping Defense Mechanisms to Specific Threats: An Analysis of OWASP-LLM-Top-10 Coverage and Vulnerability to Paraphrasing

Abstract: While production Large Language Model (LLM) applications typically employ a layered defense strategy—combining refusal-phrase filters, token-budget constraints, model allowlists, rate limits, and tool-registry authentication—current breach-and-attack-simulation (BAS) benchmarks often obscure the specific efficacy of these measures by reporting only a single aggregate coverage metric. This study investigates the attribution of these defenses. By integrating four agents designed to address OWASP-LLM-Top-10 vulnerabilities into a baseline scanner of 21 agents, we evaluated four synthetic LLM endpoints: $L_0$ (unprotected), $L_1$ (refusal filters only), $L_2$ (budget controls only), and $L_3$ (the complete defense stack). It is important to note that $L_1$ and $L_2$ function as independent, single-axis ablations rather than subsets of one another, while $L_3$ represents their combination augmented with tool-registry authentication and credential scrubbing.

Analysis across $N=10$ replications yielded distinct findings for each OWASP category. The refusal mechanism alone successfully eliminated all instances of LLM01 (jailbreaking) and LLM07 (system prompt leakage). Conversely, budget controls alone were effective against LLM02 (sensitive information disclosure) and LLM10 (unbounded consumption) by terminating multi-step attack sequences. However, mitigating LLM06 (excessive agency) required the implementation of the full defense stack.

We further examined the robustness of these defenses against paraphrasing attacks. Using 300 paraphrases generated by Gemini ($K=5$ variations across a 60-template corpus), we observed that $L_1$’s refusal block rate dropped by 15 percentage points for LLM01 and 25 percentage points for LLM07. Additionally, we introduced a fifth target, $L_4$-real, which replaced the stub backend with Gemini-2.5-flash while maintaining the same $L_3$ regex configuration. This setup mirrored $L_1$’s performance exactly, suggesting that within this context, there was no measurable alignment contribution beyond the regex rules (a finding specific to this experimental setup, not a general assertion about alignment capabilities). Notably, budget controls demonstrated resilience against such mutations, showing no decline in performance (0 pp) once the rate-limit floor was accounted for. These results indicate that while a refusal whitelist may pass static benchmarks, it can be circumvented by an LLM-driven paraphraser without altering the underlying attack intent; in contrast, budget controls proved resistant to the same type of mutation.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...