arXiv

Geometry-Preserving Unsupervised Alignment for Heterogeneous Foundation Models

Title: Geometry-Preserving Unsupervised Alignment for Heterogeneous Foundation Models

Original: arXiv:2606.04385v1 Announce Type: new Abstract: Foundation models have driven rapid progress in computer vision, yet the two dominant paradigms, vision-language foundation models (VLMs) and vision-only foundation models (VFMs), remain only partially compatible. VLMs offer language-grounded semantic alignment but are often visually coarse, while VFMs learn discriminative perceptual geometry but lack semantic grounding. We propose GPUA (Geometry-Preserving Unsupervised Alignment), a framework that integrates the complementary strengths of VFMs and VLMs. Inspired by cross-lingual alignment, GPUA treats VFM features as a visual language and learns an orthogonal mapping that translates the VFM space into the VLM semantic space, preserving geometry and narrowing the modality gap without labels or model parameter updates. GPUA is task-agnostic and requires only feature-level access to pretrained models. Experiments across diverse benchmarks demonstrate improved cross-model compatibility and strong gains in downstream zero-shot recognition and segmentation with negligible overhead. Code is available at https://github.com/Yuteam14/GPUA

Rewrite: arXiv:2606.04385v1 Announce Type: new Abstract: While foundation models have accelerated advancements in computer vision, a significant compatibility gap persists between the two leading approaches: vision-only foundation models (VFMs) and vision-language foundation models (VLMs). VLMs provide semantic alignment anchored in language but frequently suffer from coarse visual representations. Conversely, VFMs capture distinct perceptual geometry yet fail to establish semantic connections. To bridge this divide, we introduce GPUA (Geometry-Preserving Unsupervised Alignment), a method that synthesizes the unique advantages of both model types. Drawing parallels to cross-lingual alignment techniques, GPUA interprets VFM features as a form of visual language. It employs an orthogonal mapping to project the VFM feature space into the semantic space of VLMs, effectively reducing the modality discrepancy while maintaining geometric integrity. This process is entirely unsupervised, requiring neither labeled data nor updates to model parameters. Being task-agnostic, GPUA operates using only feature-level access to pre-trained models. Our evaluation across a wide range of benchmarks reveals enhanced interoperability between models, yielding substantial improvements in downstream zero-shot segmentation and recognition tasks with minimal computational cost. The source code can be accessed at https://github.com/Yuteam14/GPUA


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...