arXiv

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

Title: MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

Abstract:

Diffusion large language models (dLLMs) produce text by progressively denoising partially obscured sequences within a bidirectional context. This mechanism creates a safety profile that differs significantly from autoregressive LLMs. Since mask tokens serve as native inputs and token selection is driven by confidence scores rather than positional order, malicious content can be injected via infilling tasks outside the scope of monitored prefixes. Current jailbreak methods often overlook this native infill potential or depend on low-diversity, mask-based templates applied uniformly to various objectives, lacking structural adaptation or the ability to leverage accumulated attack experience.

To address these limitations, we introduce MaskForge, a fully black-box adaptive attack that frames red-teaming dLLMs as an optimized search across an expanding library of structural patterns. MaskForge converts successful exploits into reusable schemas, utilizes a UCB bandit algorithm to choose patterns compatible with specific goals, and employs a scorer-guided fallback mechanism when existing patterns prove ineffective. By distilling successful attempts back into the pattern library, the system allows attack experience to accumulate across different objectives.

Evaluated across five public dLLMs and three benchmarks, MaskForge attains an average attack success rate of 79.3%, marking a 17.6% relative improvement over the leading competing dLLM baseline. Furthermore, the refined pattern library demonstrates strong transferability to AdvBench without requiring updates, achieving an 88.2% attack success rate and a 67% relative improvement over the strongest alternative baseline.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...