SemBlock: Semantic Boundary Dynamic Blocks for Diffusion LLMs
Title: SemBlock: Semantic Boundary Dynamic Blocks for Diffusion LLMs
Abstract
Diffusion language models (DLMs) produce text via an iterative denoising process. While blockwise decoding enhances their practical utility by committing tokens within local segments, current approaches often depend on static block sizes or delimiter-based signals at runtime. These methods frequently fail to align with true semantic boundaries. To address this, we introduce SemBlock, a dynamic block decoding framework for DLMs that is driven by semantic boundaries. SemBlock treats dynamic block construction as a semantic boundary prediction task, employing lightweight predictors trained on frozen hidden states from LLaDA. For training supervision, we developed SemBound, a dataset containing semantic boundary labels derived from discourse units, reasoning steps, and implementation spans across natural language, mathematical, and coding tasks. At inference time, SemBlock leverages predicted boundary probabilities to determine the termination point of each dynamic block. Evaluations on GSM8K, IFEval, MATH, and HumanEval demonstrate that SemBlock consistently outperforms both fixed-block decoding and AdaBlock. The source code is publicly accessible at: https://github.com/TH-AI-Lab-PKU/SemBlock.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






