WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering
Title: WaveFilter: Boosting Long-Context Performance in Diffusion LLMs Through Wavelet-Guided KV Cache Filtering
Abstract:
Diffusion Large Language Models (DLMs) have shown substantial promise across a wide range of applications. Nevertheless, their widespread adoption is hindered by significant computational costs and high inference latency during long-context processing, issues stemming directly from their multi-step iterative inference architecture. When handling extended sequences, current Key-Value (KV) caching strategies often struggle with a critical trade-off: maintaining generation quality becomes increasingly difficult as context length grows. The primary obstacle is the efficient and accurate identification of essential tokens within these ultra-long contexts. Drawing inspiration from human reading patterns, we introduce WaveFilter, a novel, training-free, and universally applicable caching framework. This approach leverages wavelet transforms to decompose lengthy sequences, enabling the precise detection of pivotal tokens. Consequently, a sparse KV Cache is built to derive the final contextual representation. Our experiments reveal that WaveFilter serves as a versatile, plug-and-play solution that markedly improves the efficacy of leading mainstream KV Cache methods in complex long-context scenarios.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




