arXiv

COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

Title: COMBINER: Leveraging Attribute-Based Neighbor Relations for Composed Image Retrieval

Abstract

Composed Image Retrieval (CIR) poses a significant challenge in the field of information retrieval, aiming to locate specific images using combined multimodal inputs. While CIR methodologies have advanced recently, existing approaches frequently fail to adequately handle scenarios where images share visual similarities but possess distinct attributes. This oversight can negatively impact both the fusion of multimodal features and the accuracy of similarity modeling. To address this gap, we propose a unified representation for cross-modal features that relies on attribute prototypes.

However, resolving these issues is complex due to three primary challenges: semantic entanglement at the attribute level, modality inconsistencies, and a lack of supervised signals. To overcome these hurdles, we present COMBINER, a network for COMposed image retrieval guided By attrIbute-based NEighbor Relations. Our framework incorporates three key components:

  1. An Adaptive Semantic Disentanglement module that separates attribute features derived from multimodal primitive features.
  2. A Unified Prototype-based Composition module designed to construct Cross-modal Unified Prototypes (CUP) and streamline the composition of multimodal features.
  3. A Dual Relations Modeling module that extracts both pairwise and neighbor relationships based on attribute similarity.

COMBINER distinguishes itself as the first study to specifically address the phenomenon of samples that are visually similar yet attribute-unrelated, a limitation in traditional neighbor-relation-based CIR methods. By utilizing an attribute prototype-based similarity metric, our approach enables a more precise interpretation of semantic relationships among samples. Extensive experiments across three benchmark datasets validate the effectiveness of COMBINER. The source code for our method will be made available at https://github.com/Lee-zixu/COMBINER.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Exelon CEO Sees Daily Cybersecurity Threats
Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower
Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...