Global News Digest

arXiv

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Title: BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Abstract: Speculative decoding accelerates autoregressive generation by employing a drafter model to suggest multiple tokens, which are then validated in parallel by a verifier. In environments with limited resources, the drafter utilizes a sparse key-value (KV) cache to manage peak GPU memory usage and reduce end-to-end latency within a fixed KV budget, whereas the verifier maintains a complete KV cache. While mid-to-long context inference (ranging from 4K to 16K tokens) is prevalent in practical applications, standard speculative decoding approaches that combine sparse and full caches often struggle as context length increases. This naive method suffers from a mismatch between sparse and full states, leading to a rapid decline in token acceptance rates. To address this, we introduce BudgetDraft, a multi-view sparse training framework designed for drafting in mid-to-long inference scenarios. During training, the drafter encounters various sampled KV budgets, learning to align each sparse representation with a unified full-cache teacher target. BudgetDraft integrates an acceptance-aware loss on the full-cache branch with a multi-view loss on the sparse-cache branch, resulting in a single drafter that is robust to budget variations. This approach restores acceptance rates across different sparsity levels without requiring additional components during inference. Benchmarks on PG-19, LongBench, and LWM demonstrate that BudgetDraft delivers end-to-end speedups of up to 6.55x, 4.46x, and 2.10x compared to autoregressive (AR) decoding at context lengths of 4K, 8K, and 16K, respectively, while maintaining a memory-efficient inference pipeline.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.