DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation
Title: DREAM-S: Enhancing Multimodal Generation via Speculative Decoding with Searchable Drafting and Target-Aware Refinement
Abstract: While speculative decoding (SD) has demonstrated significant efficacy in accelerating autoregressive generation for large language models (LLMs), its potential within vision-language models (VLMs) has seen limited investigation. To address this gap, we introduce DREAM-S, a specialized SD framework engineered to optimize speed and efficiency during VLM decoding. This approach utilizes a neural architecture search (NAS) system, combined with target-aware supernet training, to autonomously determine the most effective interaction protocols between draft and target models, as well as the ideal draft model architecture tailored to specific hardware implementations. Furthermore, DREAM-S employs adaptive intermediate feature distillation, regulated by attention entropy, to streamline the draft training process. Empirical evaluations across various established VLMs reveal that DREAM-S delivers a maximum speedup of $3.85\times$ relative to conventional decoding methods, substantially surpassing current SD baselines. The implementation is open-source and accessible at: https://github.com/SAI-Lab-NYU/DREAM-S .
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





