arXiv

Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

Title: Preserving Exact Equivariance During Training Enables Zero-Shot Generalization Across Symmetry Groups

Abstract: A latent world model comprising an equivariant encoder $E$ and an equivariant predictor $f$ inherits a provable symmetry within its training loss. When the world's dynamics genuinely exhibit a group $G$ acting on latents via an orthogonal representation $\rho(g)$, the relative mean squared error (relMSE) for one-step predictions remains exactly invariant across the entire group. Consequently, fitting the dynamics to a restricted slice of orientations mathematically determines the behavior across the whole orbit. We verify this end-to-end at laptop scale (CPU/MPS, fully seeded).

[A] The symmetry persists through a real Muon/AdamW + EMA + VICReg training run. The composed encode-then-predict residual reaches $\sim 10^{-6}$ after optimization, a state maintained not just at initialization but under any optimizer.

[B] One-step error remains flat to five digits across the group. In contrast, a non-equivariant baseline of the same hypothesis class fits the training slice but fails out-of-distribution, with error rates of VN $\times 1.00$ versus the baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, and $\times 157$ over the full $\mathrm{SE}(3)$ hierarchy. Additionally, the equivariant model is $4.5$-$7.4\times$ smaller.

[C] This isometry argument extends to closed-loop scenarios: under a matching equivariant planner, the control trajectory at orientation $g$ is exactly $\rho(g)$ applied to the observed one. Thus, closed-loop error is invariant across the group—achieving float-floor-exact precision in 2D/$\mathrm{SO}(2)$ on real PushT tasks and remaining statistically flat in 3D/$\mathrm{SE}(3)$ (with disjoint 95% confidence intervals). We stress-test the prior against Sutton's Bitter Lesson: while augmentation, brute-force scaling, and soft-equivariance each improve the across-group task metric to some degree, none achieve the float-floor exactness. Because equivariance is closed under composition, $H$-fold rollouts remain flat ($\times 1.00$, $\le 2\times 10^{-7}$) at every horizon, whereas the baseline's residual compounds with $H$. Out of scope: task-success sweeps, planner-free invariance, and scaling.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...