arXiv

D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models

Title: D^2SD: Enhancing Speculative Decoding Speed via Dual Diffusion Drafting Mechanisms

Abstract: Speculative decoding boosts the efficiency of autoregressive large language model inference by proposing multiple tokens and validating them simultaneously in a single forward pass of the target model. While recent diffusion-based drafters can generate token blocks in parallel, they typically adhere to a single draft sequence during verification. Consequently, if the first mismatch arises, all subsequent tokens in that draft are discarded, leading to a constrained acceptance rate. Simply increasing the number of batched draft candidates yields only marginal gains, as redundant or ill-positioned branches elevate drafting and verification costs without a proportional rise in accepted tokens. To address this, we introduce D^2SD, a speculative decoding framework utilizing dual diffusion drafters. This approach structures candidates within a confidence-guided prefix tree. Initially, a diffusion drafter produces a token block alongside per-position confidence scores, which help pinpoint the most probable rejection boundary and identify the top-K prefix ranges for recovery. Subsequently, a second variable-prefix diffusion drafter re-anchors at each chosen prefix to propose alternative continuations in a single batched operation. These candidates, which share common prefixes, are then jointly verified using cascade attention. Experimental results demonstrate that D^2SD significantly outperforms both the foundational diffusion method and robust autoregressive speculative decoding baselines.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade
Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...