The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning
Title: Beyond the Loss: Sampling Constraints and Inductive Bias in Contrastive Representation Learning
Abstract:
While contrastive learning stands as a dominant approach in self-supervised representation learning, the precise conditions required for it to successfully recover meaningful latent geometry are not yet fully elucidated. This study establishes a measure-theoretic framework to formalize the "diversity condition," a critical support requirement for positive-pair sampling that is essential for achieving isometric latent recovery. We demonstrate that the conventional full-support von Mises-Fisher distribution satisfies this diversity condition, ensuring that global minimizers of the contrastive loss recover the latent geometry up to an orthogonal transformation. Conversely, when conditional distributions are restricted, non-orthogonal mappings can achieve a strictly lower asymptotic contrastive loss. To address this, we propose a support-corrected variant of Information Noise Contrastive Estimation (InfoNCE). This theoretical adjustment renders orthogonal latent space recovery feasible, although it does not uniquely enforce it. Our experimental validation on synthetic benchmarks confirms these identifiability predictions, while results from CIFAR-10 align with the qualitative hypothesis that architectural inductive bias plays a more pivotal role when sampling diversity is constrained. Collectively, these findings shed light on the interplay between sampling mechanisms and encoder inductive bias within contrastive representation learning.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


