Global News Digest

arXiv

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

Title: STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

Abstract:

While vision-language-model (VLM)-based agents designed for graphical user interface (GUI) interactions demonstrate significant potential for automation, their practical deployment is hindered by the linear expansion of the key-value (KV) cache as interaction steps accumulate. To illustrate the scale of this issue, the UI-TARS-1.5-7B model requires 76 GB of GPU memory to process just five screenshots, a volume that nearly saturates the capacity of standard 80 GB accelerators. Current approaches to KV compression generally rely on two structural premises: consolidating visual-token importance into a unified saliency map and enforcing a rigid top-B threshold on the combined score distribution. However, initial measurements challenge these assumptions. We find that spatial specialization operates at the attention-subspace level and shifts across layers, while the shape of the score distribution evolves dynamically throughout the sequence.

To address these limitations, we introduce STaR-KV (Spatio-Temporal Adaptive Re-weighting), a training-free framework for compressing KV caches. This method recalibrates token importance across three distinct dimensions: (i) subspace-aware scoring, which leverages online spatial mutual information; (ii) a temporal stability discount mechanism that filters out redundant entries from subspaces that remain persistently attended; and (iii) an entropy-based temperature parameter that dynamically adjusts the score distribution. Evaluated across four GUI benchmarks, STaR-KV delivers the highest average accuracy among leading KV compression techniques, such as SnapKV and GUIKV, under equivalent memory constraints. Notably, it introduces negligible computational overhead (-0.07% in FLOPs) during compression and reduces peak GPU memory usage by approximately 40% when operating at a 20% KV-cache budget. The project code is accessible at https://github.com/kawhiiiileo/STaR-KV.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.