Global News Digest

arXiv

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

Title: TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

Original: arXiv:2606.00487v1 Announce Type: new Abstract: Using a diffusion model for parallel drafting is a promising approach for speculative decoding. By predicting tokens at multiple future positions in a single forward pass, diffusion drafters substantially reduce drafting latency. However, this shifts the bottleneck to verification: verifying a single sequence limits acceptance length, while verifying large draft trees incurs excessive target-model latency. We identify a key mismatch in existing draft-tree methods: existing diffusion-tree methods rank nodes by the marginal probability, ignoring that verification is prefix-conditioned. As a result, they may verify unreachable descendants of rejected prefixes, increasing latency with limited acceptance gains. To address this, we propose TAPS, a target-aware prefix selection method that turns diffusion marginals into path-conditioned acceptance estimates. TAPS then selects a compact prefix-closed subtree under a fixed verification budget, improving the acceptance-cost tradeoff rather than simply expanding the draft tree. Experiments across diverse datasets and model families demonstrate that TAPS achieves up to 7.9x lossless end-to-end speedup over vanilla autoregressive decoding, outperforming state-of-the-art DFlash and DDTree by 1.36x and 1.74x respectively. Our work is available at https://anonymous.4open.science/r/TAPS-EMNLP2026-53DD

Rewrite:

Abstract: Leveraging diffusion models for parallel drafting presents a compelling strategy for speculative decoding. These diffusion-based drafters significantly cut down drafting latency by forecasting tokens across various future positions within a single forward pass. Nevertheless, this efficiency gain relocates the primary bottleneck to the verification stage. Specifically, the verification of a single sequence restricts the length of accepted tokens, whereas the verification of extensive draft trees imposes a heavy computational load on the target model.

We highlight a critical flaw in current draft-tree methodologies: existing diffusion-tree approaches prioritize nodes based on marginal probability, overlooking the fact that verification is inherently prefix-conditioned. Consequently, these methods often end up verifying descendants that cannot be reached if earlier prefixes are rejected, thereby inflating latency without proportional improvements in acceptance rates.

To resolve this issue, we introduce TAPS (Target-Aware Prefix Selection), a novel method that transforms diffusion marginals into acceptance estimates conditioned on the specific path taken. TAPS identifies a compact, prefix-closed subtree that adheres to a predefined verification budget. This approach optimizes the balance between acceptance rates and computational cost, rather than merely aiming to maximize the size of the draft tree.

Our empirical evaluations, conducted across a wide range of datasets and model architectures, show that TAPS delivers a lossless end-to-end speedup of up to 7.9x compared to standard autoregressive decoding. Furthermore, it surpasses current state-of-the-art methods, DFlash and DDTree, by factors of 1.36x and 1.74x, respectively. The research is accessible at https://anonymous.4open.science/r/TAPS-EMNLP2026-53DD


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.