Emergent Ordinal Geometry in Transformers Trained on Local Comparisons
Title: Emergent Ordinal Geometry in Transformers Trained on Local Comparisons
Original: arXiv:2606.01269v1 Announce Type: new Abstract: Transitive inference is the challenge of inferring that A < C from knowing only adjacent relations (A < B, B < C). It is solved by humans and animals not through logical chaining but via an analogue mental number line, whose signature is the symbolic distance effect: distant comparisons are easier than nearby ones. We ask whether Transformers acquire the same primitive, training small models exclusively on adjacent comparisons from a hidden total order and evaluating generalization to unseen distant pairs. We find that out-of-distribution generalization emerges alongside a striking geometric reorganization: entity embeddings collapse onto a one-dimensional manifold whose principal axis recovers the hidden rank order with near-perfect fidelity, and this structure is sensitive to optimization in ways that produce grokking-like transient dynamics. Critically, even when accuracy is at ceiling, decision confidence and geometric separation both scale monotonically with rank distance, directly mirroring the symbolic distance effect observed across decades of behavioural experiments on humans, primates, and rodents. These results ground a 50-year-old behavioural regularity in the geometry of learned representations, offering a mechanistic account of transitive inference that bridges cognitive science and modern neural networks.
Rewrite: Title: Emergent Ordinal Geometry in Transformers Trained on Local Comparisons
Original: arXiv:2606.01269v1 Announce Type: new Abstract: The task of transitive inference involves deducing that A < C based solely on knowledge of immediate neighbors (A < B, B < C). While humans and other animals resolve this not by sequential logical steps but through an internal analogical number lineācharacterized by the symbolic distance effect, where comparing distant items is simpler than comparing close onesāwe investigate whether Transformers can develop similar capabilities. By training compact models exclusively on adjacent pairs drawn from a concealed total order and testing their ability to generalize to novel, distant pairs, we observe that out-of-distribution performance coincides with a profound geometric shift: entity embeddings converge onto a one-dimensional manifold. The primary axis of this manifold accurately reconstructs the underlying rank order with high precision, and its development exhibits optimization-sensitive, grokking-like transient behaviors. Notably, even at maximum accuracy, both geometric separation and decision confidence increase steadily with rank distance, replicating the symbolic distance effect documented in decades of behavioral studies involving humans, primates, and rodents. This work anchors a half-century-old behavioral pattern within the geometry of learned representations, providing a mechanistic explanation for transitive inference that connects cognitive science with contemporary neural network theory.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




