arXiv

BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

Title: BlockGen: A Hybrid Sampler Approach to Flexible Blockwise Sequence Modeling

Abstract

Does the uniform-state diffusion framework offer a more robust paradigm for discrete diffusion than previously thought? Emerging evidence suggests it does. When paired with predictor-corrector samplers, Uniform-State Diffusion Models (USDMs) generate samples of superior quality compared to Masked Diffusion Models (MDMs). Furthermore, USDMs match or surpass MDMs in downstream performance metrics, despite demonstrating higher perplexity. However, two critical limitations in current literature remain. First, existing benchmarks evaluate uniform and masked diffusion using uninformed correctors that reintroduce noise at random locations, rather than focusing on tokens with the highest probability of error. Second, prior comparisons rely on full-sequence diffusion, leaving open the question of whether these findings hold when tokens are produced sequentially in blocks.

To resolve these gaps, we present BlockGen, a blockwise sequence modeling architecture implemented with both masked and uniform diffusion mechanisms. By training on a diverse mixture of block sizes, BlockGen achieves a more nuanced interpolation between Autoregressive (AR) and pure diffusion likelihoods than models constrained to a single block size. This architecture facilitates AR-informed predictor-corrector sampling (ARPC), a method that leverages both AR and diffusion predictions to regenerate low-probability tokens without the need for an auxiliary verifier.

Our experiments reveal distinct performance characteristics under different sampling regimes. During ancestral sampling, uniform diffusion outperforms masked diffusion in the block-by-block setting, a trend that is particularly pronounced in few-step scenarios. However, when utilizing ARPC, this advantage diminishes and eventually reverses at high numbers of function evaluations (NFE). Specifically, with a block size of 16, MDMs achieve marginally higher accuracy on GSM8K compared to USDMs. A comparable pattern is observed in Generative Perplexity metrics on OpenWebText.

Code repository: https://github.com/jdeschena/blockgen


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...