arXiv

BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

June 2, 2026 · Justin Deschenaux, Caglar Gulcehre · Original Source

Title: BlockGen: A Hybrid Sampler Approach to Flexible Blockwise Sequence Modeling

Abstract

Does the uniform-state diffusion framework offer a more robust paradigm for discrete diffusion than previously thought? Emerging evidence suggests it does. When paired with predictor-corrector samplers, Uniform-State Diffusion Models (USDMs) generate samples of superior quality compared to Masked Diffusion Models (MDMs). Furthermore, USDMs match or surpass MDMs in downstream performance metrics, despite demonstrating higher perplexity. However, two critical limitations in current literature remain. First, existing benchmarks evaluate uniform and masked diffusion using uninformed correctors that reintroduce noise at random locations, rather than focusing on tokens with the highest probability of error. Second, prior comparisons rely on full-sequence diffusion, leaving open the question of whether these findings hold when tokens are produced sequentially in blocks.

To resolve these gaps, we present BlockGen, a blockwise sequence modeling architecture implemented with both masked and uniform diffusion mechanisms. By training on a diverse mixture of block sizes, BlockGen achieves a more nuanced interpolation between Autoregressive (AR) and pure diffusion likelihoods than models constrained to a single block size. This architecture facilitates AR-informed predictor-corrector sampling (ARPC), a method that leverages both AR and diffusion predictions to regenerate low-probability tokens without the need for an auxiliary verifier.

Our experiments reveal distinct performance characteristics under different sampling regimes. During ancestral sampling, uniform diffusion outperforms masked diffusion in the block-by-block setting, a trend that is particularly pronounced in few-step scenarios. However, when utilizing ARPC, this advantage diminishes and eventually reverses at high numbers of function evaluations (NFE). Specifically, with a block size of 16, MDMs achieve marginally higher accuracy on GSM8K compared to USDMs. A comparable pattern is observed in Generative Perplexity metrics on OpenWebText.

Code repository: https://github.com/jdeschena/blockgen

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC