Learning Coherent Representations: A Topological Approach to Interpretability
Title: Learning Coherent Representations: A Topological Approach to Interpretability
Abstract:
In deep neural networks, learned representations frequently suffer from a lack of interpretability, as individual features often correspond to scattered and unrelated inputs rather than meaningful concepts. To address this, we introduce the concept of "coherence," a geometric property motivated by neural coding mechanisms observed in the brain, such as those found in grid cells and head direction cells, which respond to contiguous regions of state space. We define a non-negative matrix as coherent if its rows (samples) and columns (features) attend to geometrically clustered counterparts, ensuring that every sample is adequately represented by at least one feature and that every feature is essential to at least one sample.
We demonstrate that coherent matrices create a bounded interleaving between the Vietoris-Rips filtrations of samples and features. This result guarantees that both spaces possess compatible topological structures, thereby enhancing interpretability. For instance, when data is distributed along a circle, coherent features are constrained to tile the circle into contiguous arcs. To enforce this property, we propose "Coh," a differentiable objective function derived from Fréchet variance. Unlike sparsity, which merely limits the number of samples a feature activates on, coherence restricts which samples activate a feature, demanding geometric connectivity in addition to rarity. This approach produces not only interpretable features but also an interpretable feature space. We validate the effectiveness of Coh through experiments on synthetic data and rotated MNIST datasets using an auto-encoder, as well as on language data using BERT token embeddings.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



