arXiv

CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models

Title: LiteLVLM: A Training-Free Approach to Efficient Pixel Grounding via Text-Guided Token Pruning in Large Vision-Language Models

Abstract: In the realm of large vision-language models, visual tokens typically account for the bulk of input data, resulting in significant computational burdens. While recent research has focused on pruning redundant or less informative visual tokens to optimize image understanding, these approaches often falter in pixel grounding tasks. This is largely because token relevance in grounding is heavily dependent on the specific input text. Our in-depth examination of CLIP reveals a counterintuitive phenomenon: visual tokens located within referent regions frequently show low similarity to their corresponding textual descriptions. Leveraging this finding, we propose LiteLVLM, a novel, training-free strategy that utilizes text guidance to prune tokens efficiently for pixel grounding inference. LiteLVLM works by inverting the standard ranking of CLIP’s visual-text similarity scores. This reversal ensures that visual tokens encompassing referent regions are preserved, while simultaneously recovering context tokens to facilitate distinct foreground-background differentiation. Comprehensive experiments indicate that LiteLVLM surpasses current state-of-the-art methods by more than 5% across various token budget constraints. Notably, LiteLVLM achieves a 22% increase in speed and a 2.3-fold reduction in memory usage while retaining 90% of the original model’s performance, all without requiring any training or fine-tuning. The code for LiteLVLM is accessible at https://github.com/sejong-rcv/LiteLVLM.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...