Beyond Symmetric Alignment: Spectral Diagnostics of Modality Imbalance in Vision-Language Models in the Medical Domain
Title: Diagnosing Modality Imbalance in Medical Vision-Language Models: A Spectral Approach Beyond Symmetric Alignment
Abstract
Despite their potential, Vision-Language Models (VLMs) often falter when processing medical image-text pairs, yet there is a scarcity of robust tools to identify the root causes of these failures. Current methods for measuring representation alignment are inherently symmetric; they merge the two modalities into a single metric, thereby obscuring which specific modality is responsible for cross-modal performance degradation. To address this limitation, we propose the Spectral Alignment Score (SAS), an asymmetric diagnostic metric. SAS operates by projecting both data types onto the principal eigenbasis derived from an anchor modality, calculating eigenvalue-weighted correlations for each eigenmode. The resulting directional scores allow for the quantification of information imbalance between modalities through their difference.
We integrated SAS into a comprehensive benchmarking framework designed to assess 15 different VLMs. This evaluation utilized both natural and medical image-text datasets, incorporating six distinct alignment metrics and bidirectional retrieval tasks. Our findings reveal that medical images preserve significantly richer structural information compared to their corresponding clinical reports. This directional asymmetry is undetectable by existing competing metrics. Furthermore, SAS demonstrated the highest zero-label correlation with retrieval performance within the medical sector, establishing it as a highly effective diagnostic instrument for clinical applications. The source code for this work is accessible at https://github.com/iamalegambetti/medical-vlms-assessment.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





