arXiv

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Title: ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Abstract:

Although vision-language models such as CLIP have demonstrated exceptional performance in open-vocabulary scenarios, their utility remains largely restricted to image-level tasks, with significant challenges persisting in dense prediction. Recent studies typically blame the self-attention layers within the final block for these limitations, achieving notable improvements by altering standard query-key attention into self-correlation mechanisms, such as query-query and key-key attention. However, these approaches neglect the properties of cross-correlation attention (query-key), which is crucial for capturing rich spatial correspondences.

In this study, we demonstrate that the cross-correlation within the self-attention of CLIP’s non-final layers also possesses localization capabilities. To exploit this, we introduce the Residual Cross-correlation Self-attention (RCS) module. This module utilizes cross-correlation self-attention from intermediate layers to reshape the attention dynamics in the final block, thereby effectively reorganizing spatial information and unlocking CLIP’s potential for dense vision-language inference.

Additionally, to improve focus on same-category regions and ensure local consistency, we present the Semantic Feedback Refinement (SFR) module. This component employs semantic segmentation maps to further refine attention scores. By combining these two innovations, our proposed method, ResCLIP, serves as a plug-and-play add-on that can be seamlessly integrated into existing frameworks, yielding substantial performance gains in dense vision-language inference. Comprehensive experiments on various standard benchmarks confirm that our approach outperforms current state-of-the-art training-free methods, underscoring its efficacy. The code is accessible at https://github.com/yvhangyang/ResCLIP.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

23andMe Is Back as Nonprofit Aiming to Reach 100 Million Users
Bloomberg

23andMe Is Back as Nonprofit Aiming to Reach 100 Million Users

23andMe has transitioned into a nonprofit, aiming to onboard 100 million users to democratize genetic access and advance...

Trump Officials Held Millions of Dollars of SpaceX Ahead of IPO
Bloomberg

Trump Officials Held Millions of Dollars of SpaceX Ahead of IPO

Reports indicate Trump administration officials withheld millions in SpaceX payments ahead of its IPO. The delay occurre...

AI Jitters Fuel Biggest Swings in India’s IT Stocks Since 2020
Bloomberg

AI Jitters Fuel Biggest Swings in India’s IT Stocks Since 2020

AI uncertainty is driving the largest volatility in Indian IT stocks since 2020, causing significant market swings.

SpaceX IPO Terms Due & Trump's New Tariffs | The Pulse 6/3/2026
Bloomberg

SpaceX IPO Terms Due & Trump's New Tariffs | The Pulse 6/3/2026

Spacecraft giant SpaceX nears finalizing its IPO structure, while former President Trump announces new tariffs, reshapin...

News Publishers Weigh Whether AI is Industry Killer or Savior
Bloomberg

News Publishers Weigh Whether AI is Industry Killer or Savior

NYT shares fell after missing financial forecasts, following a tech staff strike. This occurs amid industry debates on A...

Reuters

When IPOs go wrong: SpaceX, AI firms face a delicate process

SpaceX and AI firms face a delicate IPO process amid complex markets. Their transition to public offerings is fraught wi...