Backdooring Masked Diffusion Language Models
Title: Exploiting Vulnerabilities in Masked Diffusion Language Models
Abstract
While masked diffusion language models (MDLMs) are rapidly gaining traction as a novel approach to text generation, their security during the training phase has received little attention. Traditional backdoor attacks designed for Gaussian diffusion models or autoregressive language models are not directly applicable to MDLMs, as these models operate on discrete state corruption and iterative denoising rather than the continuous noising or left-to-right prediction methods used by their counterparts. This paper introduces the first comprehensive investigation into training-time backdoor attacks targeting MDLMs.
We introduce SHADOWMASK, a novel attack vector that alters the forward corruption process of MDLMs. By substituting the standard all-mask terminal distribution with a trigger-mask mixture prior, SHADOWMASK establishes a specific denoising pathway that leads from trigger-corrupted states to targets defined by the attacker, all while maintaining the model’s clean denoising performance. To support this approach, we offer a rigorous mathematical framework that defines the backdoored forward process, calculates the reverse-time posterior, and establishes the continuous-time training objective.
Our empirical evaluations, conducted on DiT-based MDLMs and LLaDA-8B-Instruct across the WikiText-103, OpenWebText, and Alpaca datasets, demonstrate that SHADOWMASK achieves an attack success rate approaching 100%. The method significantly surpasses standard data poisoning techniques, maintains high utility for clean tasks, and remains effective even under full-model and parameter-efficient fine-tuning scenarios. Furthermore, the attack proves robust against several representative defensive measures.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



