Global Geometry Is Not Enough for Vision Representations
Title: Global Geometry Is Not Enough for Vision Representations
Abstract:
Representation learning has long operated under the premise that embeddings distributed effectively on a global scale are essential for building robust and generalizable models. This prevailing view has influenced both the design of training objectives and the metrics used for evaluation, effectively positioning global geometry as a stand-in for overall representational capability. However, while global geometry is adept at indicating the presence of specific elements, it frequently fails to capture how those elements are structured or combined. To explore this shortcoming, we evaluate the capacity of geometric metrics to forecast compositional binding across a wide array of vision encoders. Our findings reveal that conventional geometry-based statistics show almost no correlation with compositional binding abilities. Conversely, functional sensitivity—quantified via the input–output Jacobian—proves to be a reliable indicator of this specific capability. We offer an analytical explanation for this divergence, attributing it to the nature of current loss functions, which explicitly regulate embedding geometry while leaving the local input–output mapping largely unconstrained. These insights indicate that global embedding geometry offers only a limited perspective on representational competence, highlighting functional sensitivity as a vital complementary dimension for modeling composite structures.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






