arXiv

Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness

Title: Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness

Abstract: While the influence of positional embeddings (PEs) on the performance and robustness of Vision Transformers (ViTs) is widely recognized, their specific function in molding internal spatial representations remains poorly understood. This study investigates how various PE formats affect the representational geometry of ViTs and links these geometric changes to model resilience against distribution shifts that disrupt visual content. To measure spatial structure within token representations, we propose a new metric: Spatial Similarity Distance Correlation (SSDC). Our analysis reveals that ViTs trained without PEs do develop non-trivial spatial structures; however, these structures are content-dependent and disintegrate when tokens are permuted. Conversely, we observe that all examined PE types—specifically learned absolute, sinusoidal, and rotary encodings—drive a consistent shift toward an index-based spatial organization. Consequently, the representations in these models maintain stability against content-disrupting perturbations and demonstrate significantly enhanced robustness to such distributional changes. Furthermore, although different PEs generate distinct depth-wise trajectories for spatial structure, their robustness characteristics are largely comparable, with only minor variations across encoding schemes. This suggests that resilience relies more heavily on the existence of a stable positional reference frame than on the particular encoding mechanism employed. These findings provide a geometric explanation for how positional encodings mold internal representations, offering valuable insights for the principled development of future encoding strategies.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...