arXiv

Learning Unmasking Policies for Diffusion Language Models

June 3, 2026 · Metod Jazbec, Theo X. Olausson, Louis B\'ethune, Pierre Ablin, Michael Kirchhof, Jo\~ao Monteiro, Victor Turrisi, Jason Ramapuram, Marco Cuturi · Original Source

Title: Acquiring Unmasking Strategies for Diffusion Language Models

Abstract:

Diffusion (Large) Language Models (dLLMs) have reached parity with autoregressive models in downstream task performance, while offering the potential for greater inference efficiency. A pivotal component of dLLM architecture is the sampling mechanism responsible for determining which tokens to unmask during each diffusion stage. Previous studies indicate that heuristic methods, such as confidence thresholding, enhance both token throughput and sample quality relative to random unmasking. Nevertheless, these heuristic approaches present limitations: they necessitate manual parameter tuning and exhibit performance declines as block sizes increase. To address these issues, we introduce a reinforcement learning framework to train sampling procedures. We model masked diffusion sampling as a Markov decision process, with the dLLM acting as the environment, and develop a compact policy utilizing a single-layer transformer. This policy translates dLLM token confidence scores into unmasking choices. Our results demonstrate that these learned policies achieve performance comparable to leading heuristics when integrated with semi-autoregressive (block) generation, while surpassing them in full-diffusion scenarios.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC