Global News Digest

arXiv

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

Title: DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

Abstract:

While discrete masked diffusion language models utilize iterative parallel decoding for text generation, they face a significant dilemma during few-step inference: a trade-off between output length and quality. Given a fixed step budget, conventional approaches are forced to choose between generating concise, high-quality text or producing lengthy but repetitive content. Continuous denoising offers a solution to this limitation by jointly evolving all text positions within embedding space. However, developing such a model from the ground up at a large scale remains an unresolved challenge.

In this work, we demonstrate that a pretrained masked Diffusion Language Model (DLM) can be efficiently adapted to enable continuous denoising in the embedding space. Beginning with LLaDA-8B-Instruct, we perform a lightweight continue-pretraining phase consisting of just 1,000 steps using Discrete Stochastic Localization (DSL). This process substitutes traditional binary masking with continuous, per-token Gaussian noise, which acts as a soft mask. The resulting model facilitates continuous inference, allowing all positions to evolve simultaneously in embedding space while postponing hard token commitment until the final step.

In zero-shot summarization tasks with low step budgets (fewer than or equal to 16 forward passes), DSL-LLaDA-SDE delivers the highest ROUGE-1 scores across four distinct benchmarks. Notably, it largely circumvents the premature termination and repetition issues commonly associated with iterative unmasking. Furthermore, this adaptation confers selective robustness to noisy states; the model is capable of correcting corrupted tokens without disturbing those that are already clean. Control experiments employing standard masked diffusion training with equivalent computational resources did not exhibit these advantageous behaviors.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.