arXiv

MineDraft: A Framework for Batch Parallel Speculative Decoding

June 2, 2026 · Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low · Original Source

Title: MineDraft: A Batch Parallel Speculative Decoding Framework

Abstract: Speculative decoding (SD) enhances the efficiency of large language model inference by employing a smaller draft model to generate candidate tokens, which are then validated by a larger target model. However, conventional SD approaches are frequently bottlenecked by the rigidly sequential nature of their drafting and verification phases. To overcome this limitation, we introduce MineDraft, a batch parallel speculative decoding (PSD) framework engineered to conceal drafting latency by overlapping it with the verification process. Theoretical analysis indicates that PSD offers substantially greater efficiency compared to standard SD. MineDraft achieves this through an innovative batch-parallel architecture that manages two distinct batches of requests, simultaneously drafting tokens for one batch while verifying those for the other. Experimental evaluations demonstrate that MineDraft delivers substantial gains in both throughput (increasing by up to 75%) and end-to-end latency (reducing it by up to 39%) relative to standard SD. Additionally, we have integrated MineDraft as a plugin for vLLM, confirming its viability for production-grade inference systems.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC