arXiv

Intra-Modal Neighbors Never Lie: Rectifying Inter-Modal Noisy Correspondence via Graph-Based Intra-Modal Reasoning

June 4, 2026 · Yang Liu, Wentao Feng, Shu-Dong Huang, Yalan Ye, Jiancheng Lv · Original Source

Title: Intra-Modal Neighbors Never Lie: Rectifying Inter-Modal Noisy Correspondence via Graph-Based Intra-Modal Reasoning

Abstract:

While large-scale datasets harvested from the web have significantly advanced cross-modal retrieval, they are inherently plagued by noisy correspondences that hinder model generalization. Current approaches typically tackle this issue by either filtering out noise or identifying alternative labels, yet these strategies largely adhere to a "Discrete Selection" framework. We contend that depending on a single discrete proxy leads to Discretization Error and Single-Point Fragility. To address these shortcomings, we introduce Intra-modal Neighbor-aware Noise Rectification (IN2R), a novel framework that transitions the focus from finding substitutes to constructing reliable supervision targets. By capitalizing on the intrinsic geometric stability of intra-modal data, IN2R utilizes a Graph Refiner to conduct relational reasoning on neighbors drawn from a dynamic Cross-Model Memory. Rather than transmitting discrete labels, our approach generates a continuous, soft prototype that embodies the consensus of the local semantic neighborhood, thereby effectively correcting inter-modal misalignment. Comprehensive evaluations on Flickr30K, MS-COCO, and CC152K show that IN2R substantially surpasses existing state-of-the-art techniques. The source code and pre-trained models can be accessed at https://github.com/liuyyy111/IN2R.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC