Supportive Token Revealing for Fast Diffusion Language Model Decoding
Title: AXON: A Training-Free Token Reveal Strategy for Accelerated Diffusion Language Model Decoding
Abstract:
While discrete diffusion language models offer efficient text generation through the parallel updating of multiple masked positions, this parallelism inherently creates a compromise between quality and latency. Aggressive decoding risks committing to mutually dependent tokens prematurely, whereas conservative approaches demand numerous denoising steps. Current techniques attempt to resolve this conflict by determining which tokens are secure enough to reveal based on confidence or dependency metrics. However, simply avoiding unsafe commitments does not guarantee that the remaining masked sequence is easy to decode, as uncertain tokens may rely on other masked tokens, thereby creating a bottleneck for the denoising process.
To address this, we introduce AXON, a training-free module designed to integrate with existing parallel decoding strategies for diffusion language models. Instead of replacing the base decoder, AXON monitors the state of remaining uncertain masked tokens and intervenes only when their current status indicates a need for additional context. It shifts the selection criterion from identifying the "safest" tokens to reveal toward selecting confident reveals that optimally support subsequent denoising. AXON identifies "anchors"—confident masked tokens that uncertain positions attend to—by leveraging attention, uncertainty, and confidence signals. Evaluations on reasoning and code-generation benchmarks across various diffusion language models demonstrate that AXON enhances the quality-latency trade-off of existing parallel decoders, frequently reducing the number of function evaluations while preserving or enhancing accuracy.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




