arXiv

A Fiber Criterion for Representation Identifiability in Supervised Learning

June 2, 2026 · Vasileios Sevetlidis · Original Source

Title: A Fiber-Based Criterion for Identifying Representations in Supervised Learning

Abstract

Supervised learning assesses models based on their input-output mappings. When a predictor is structured as a composition $f=c\circ h$, empirical evidence from supervised learning restricts the overall map $f$ but does not necessarily fix the specific factorization of the representation $h$ and the head $c$. This study formalizes the resulting challenge of representation-level identifiability: within a set of valid representation-head pairs, a characteristic of the representation is identifiable from the resulting predictor if and only if it remains constant across the fibers of the projection mapping $(h,c)$ to $c\circ h$. This condition is equivalent to stating that the property must be well-defined with respect to the predictor itself.

The paper introduces predictor-preserving augmentation as a fundamental obstruction. This technique involves appending auxiliary data to a representation; if the head disregards this additional information, the predictor’s behavior remains unchanged, even though representation-specific attributes—such as minimality, compression, invariance, equivariance, the presence of nuisance information, or semantic accessibility—are altered. This mechanism demonstrates that representation identifiability is distinct from both optimization processes and finite-sample estimation issues.

While finite-sample diagnostics serve to illustrate rather than prove the criterion, they provide compelling evidence. Specifically, exact algebraic examples show how the predictor can remain fixed while representation diagnostics shift. Additionally, experiments with Waterbirds models exhibiting matched performance reveal that varying constraints can lead to the selection of different representations despite similar supervised accuracy. These findings underscore that assertions regarding representations necessitate assumptions, objectives, measurements, or inductive biases that extend beyond supervised predictive performance alone.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC