Bayes-Sufficient Representations in Supervised Learning
Title: Bayes-Sufficient Representations in Supervised Learning
Abstract
Representation learning is frequently characterized by the goal of retaining input data that holds predictive value. This study investigates the precise definition of "relevance" within the context of a specific, fixed supervised decision-making task. We introduce the concept of a representation being "Bayes-sufficient" for a given joint distribution and loss function, meaning that a subsequent prediction head can leverage this representation to execute a Bayes-optimal action rule. Consequently, the specific information deemed relevant is contingent upon the chosen loss function.
In scenarios where the Bayes-optimal action is unique with probability one, the central structure is the Bayes quotient. This structure groups together inputs that necessitate identical Bayes-optimal actions. A representation is considered sufficient if it refines this quotient, and it is deemed Bayes-minimal if it possesses informational equivalence to the quotient itself. This theoretical framework aligns closely with the field of property elicitation. For instance, zero-one loss necessitates the Bayes class, squared loss requires the conditional mean, Brier loss demands the conditional probability in binary settings, and log loss—or any strictly proper scoring rule—calls for the full predictive distribution.
To demonstrate the differences between sufficiency, minimality, and the retention of non-essential information, we present findings from controlled finite experiments, neural network bottleneck training, and a real-world taxonomic refinement experiment using iNaturalist data. Ultimately, for any fixed supervised problem, the underlying distribution and loss function dictate the Bayes action; this action defines the quotient, which in turn establishes the minimal information necessary for achieving Bayes-optimal prediction.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






