arXiv

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Title: Evading Production Prompt Guards via Controlled-Release Prompting

Abstract:

Recent work by Ball et al. demonstrated that prompt filtering for AI alignment encounters a fundamental theoretical hurdle: assuming standard cryptographic premises, no filter that operates significantly faster than the protected model can universally differentiate between adversarial and benign inputs. This study explores whether this theoretical limitation manifests as tangible security flaws in live large language model (LLM) deployments. We confirm this vulnerability by presenting "controlled-release prompting," a practical application of the theoretical framework that leverages the computational disparity between lightweight input filters and the robust models they are designed to protect. In contrast to purely theoretical models, our method does not necessitate altering the target model. Instead, it crafts malicious prompts that remain opaque to any filter with bounded resources, yet remain fully understandable to the intended LLM. Our experiments show the attack succeeding on four prominent chat platforms—Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat—where conventional defenses prove ineffective. Furthermore, we demonstrate the extraction of copyrighted material from Gemini using this technique. Finally, we conduct a comprehensive assessment of 14 open-weight prompt guard models, concluding that even filters equipped with reasoning capabilities fail to consistently identify these attacks without imposing unacceptable resource costs.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower
Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.
New York Times

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.

Space enthusiasts are the most eager for SpaceX’s IPO, driven by their passion for space exploration.