SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding
Title: SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding
Abstract:
Diffusion Large Language Models (DLLMs) facilitate non-autoregressive generation by iteratively denoising corrupted token sequences while leveraging bidirectional context. Although these models can update multiple positions in parallel, their inference process is often resource-intensive, primarily because high-quality output demands a substantial number of denoising steps. To address this, we introduce SAID (Scaffold-Aware Iterative Decoding), a framework designed to boost DLLM efficiency by strategically redistributing computational efforts among tokens.
SAID operates by initially dedicating denoising resources to "scaffold" tokens, thereby establishing the broad semantic framework. Subsequently, it resolves predictable "detail" tokens using significantly fewer steps. We also extend the SAID methodology to block-wise diffusion decoding, proposing Confidence-Hierarchical Layered Generation (CHLG). This variant allocates extra processing steps exclusively to tokens with low confidence scores.
Benchmark evaluations across math, coding, and knowledge tasks using LLaDA-8B and LLaDA 1.5 demonstrate that SAID markedly speeds up DLLM inference, achieving a peak acceleration of 9.1x without compromising competitive performance. The source code is publicly accessible at: https://github.com/TH-AI-Lab-PKU/SAID.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





