arXiv

KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models

Title: KODA: Aligning and Comparing Representations in Vision-Language Foundation Models via Contrastive Methods

Abstract: Multimodal learning systems frequently rely on vision-language foundation models like SigLIP and CLIP for their robust representations. Although these models are commonly benchmarked based on their downstream task performance, such metrics rarely elucidate the structural distinctions between their internal representations. This study addresses this gap by investigating Contrastive Embedding Clustering, a task aimed at uncovering subsets of samples that exhibit weak clustering within one representation yet demonstrate strong clustering under another. To this end, we introduce Kernel Optimization for Discrepancy Analysis (KODA), a framework grounded in kernel methods designed for the comparison and alignment of contrastive representations. KODA generates unified multimodal kernels by composing kernels on a modality-specific basis and frames the identification of discrepancies as a constrained optimization challenge. This approach seeks to isolate coherent structures present in a target representation while simultaneously diminishing coherence within a reference representation. Consequently, the method produces interpretable discrepancy directions that highlight specific modality interactions and sample subsets. To ensure KODA can handle large-scale vision-language datasets, we implement randomized low-dimensional approximations of joint kernels, employing techniques such as Random Fourier Features for shift-invariant kernels. Our empirical results demonstrate that KODA consistently uncovers interpretable discrepancy structures across various vision-language representations and effectively yields sample subsets suitable for representation alignment. The source code can be accessed at https://github.com/yokiwuuu/KODA.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Shark Tank Star Shrinks Data Center Footprint After Backlash
Bloomberg

Shark Tank Star Shrinks Data Center Footprint After Backlash

After public backlash, a Shark Tank entrepreneur reduced the size of a Utah data center project. This decision followed ...

Hatch’s New Bedside Sleep Clock Wirelessly Tracks Sleep Quality
Bloomberg

Hatch’s New Bedside Sleep Clock Wirelessly Tracks Sleep Quality

Hatch’s $250 screen-free sleep clock wirelessly tracks breathing, heart rate, and movement using low-power signals, offe...

Anduril's Stephens on Innovating in an Age of War
Bloomberg

Anduril's Stephens on Innovating in an Age of War

At Bloomberg Tech 2026, Anduril’s Stephens discussed AI’s role in defense and military innovation amid global conflict.

Liftoff Mobile CEO Talks IPO, Advertising and Strategy
Bloomberg

Liftoff Mobile CEO Talks IPO, Advertising and Strategy

Liftoff Mobile’s CEO discusses IPO plans, navigating ad market trends, and outlining the company's strategic direction f...

Samsung Sponsor Spotlight
Bloomberg

Samsung Sponsor Spotlight

The request lacks source text for the "Samsung Sponsor Spotlight" article. Please provide the original content to enable...

AI Isn’t Replacing Credit Hedge Fund Traders Yet, Barclays Says
Bloomberg

AI Isn’t Replacing Credit Hedge Fund Traders Yet, Barclays Says

Barclays states AI hasn’t replaced credit hedge fund traders yet. Human expertise remains vital for complex decisions, m...