arXiv

TGV-KV: Text-Grounded KV Eviction for Vision-Language Models

Title: TGV-KV: Text-Grounded KV Eviction for Vision-Language Models

Abstract: Vision-Language Models (VLMs) typically employ an auto-regressive generation framework, caching the keys and values (KV) of all preceding tokens to speed up inference. However, this practice causes memory usage to grow linearly with context length. This problem is especially acute in VLMs because the visual modality contains significant redundancy. While KV cache eviction techniques can lower memory demands, they frequently lead to notable performance drops in VLMs. This occurs because most existing eviction strategies are tailored for language models and fail to account for the fundamental disparity between text and vision. In this study, we systematically examine the modality gap within VLMs, positing that the significance of visual data should be evaluated through textual guidance. Based on this insight, we introduce TGV-KV, a Text-Grounded KV Eviction method designed for VLMs. TGV-KV integrates three distinct components: (1) Text-Vision Budgeting (TVB), which distributes resources to each layer according to mutual information interactions; (2) Text-Weighted Ranking (TWR), which determines the priority of text and ranks visual importance using weighted text-image attention; and (3) Text-Prioritised Retention (TPR), a strategy that safeguards text KV to prevent severe information loss. We tested TGV-KV on five models of varying sizes and architectures. The results demonstrate that TGV-KV maintains 99.2% of full-KV accuracy on the VizWiz-VQA task when using LLaVA-NeXT, and increases end-to-end throughput by 52.6% under an extreme retention budget of 5%. The implementation is accessible at https://github.com/Danielement321/TGV-KV.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...