arXiv

Attention-Based Sampler for Diffusion Language Models

June 4, 2026 · Yuyan Zhou, Kai Syun Hou, Weiyu Chen, James Kwok · Original Source

Title: Attention-Based Sampler for Diffusion Language Models

Original: arXiv:2604.08564v2 Announce Type: replace-cross

Abstract: While auto-regressive models (ARMs) currently dominate the landscape of language modeling, their reliance on strictly sequential sampling creates inherent bottlenecks in both inference speed and modeling adaptability. Diffusion-based large language models (dLLMs) have emerged as a promising alternative, enabling parallel sampling and greater flexibility. Nevertheless, existing dLLM sampling techniques predominantly depend on token-level data, neglecting the broader structural context of the sequence and frequently resulting in inferior outcomes.

This study investigates the selection of sampling order through the lens of log-likelihood maximization. We demonstrate that this optimization task is NP-hard, leading us to develop an approximation method based on optimal sampling ranks to render the problem computationally feasible. Furthermore, we establish that this tractable objective is maximized when tokens are sampled in descending order of their attention-matrix column sums. This discovery offers a rigorous theoretical foundation for attention-guided sampling, presenting a robust alternative to conventional greedy search strategies.

To apply these insights, we introduce Attn-Sampler, a novel training-free sampling algorithm, and incorporate dynamic attention thresholding to further boost practical efficiency. Comprehensive evaluations across various benchmarks confirm the efficacy of our approach, showing that it delivers higher generation quality alongside improved parallelism during sampling.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC