Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models
Title: Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models
Original: arXiv:2601.12247v3 Announce Type: replace-cross Abstract: Diffusion Language Models (DLMs) present a promising non-sequential paradigm for text generation, distinct from standard autoregressive (AR) approaches. However, current decoding strategies often adopt a reactive stance, underutilizing the global bidirectional context to dictate global trajectories. To address this, we propose Plan-Verify-Fill (PVF), a training-free paradigm that grounds planning via quantitative validation. PVF actively constructs a hierarchical skeleton by prioritizing high-leverage semantic anchors and employs a verification protocol to operationalize pragmatic structural stopping where further deliberation yields diminishing returns. Extensive evaluations on LLaDA-8B-Instruct and Dream-7B-Instruct demonstrate that PVF reduces the Number of Function Evaluations (NFE) by up to 65% compared to confidence-based parallel decoding across benchmark datasets, unlocking superior efficiency without compromising accuracy.
Rewrite: Diffusion Language Models (DLMs) offer a compelling alternative to traditional autoregressive (AR) methods by enabling non-sequential text generation. Despite this potential, existing decoding mechanisms tend to be reactive, failing to fully leverage global bidirectional context to guide the overall generation trajectory. In response, we introduce Plan-Verify-Fill (PVF), a training-free framework that anchors its planning process through quantitative validation. PVF builds a hierarchical structure by focusing on high-impact semantic anchors and utilizes a verification protocol to implement a pragmatic stopping criterion, halting the process when additional deliberation provides minimal benefit. Our extensive testing on LLaDA-8B-Instruct and Dream-7B-Instruct models reveals that PVF decreases the Number of Function Evaluations (NFE) by as much as 65% relative to confidence-based parallel decoding across various benchmarks. This approach achieves significant gains in efficiency while maintaining high accuracy.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



