Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group
Title: Preserving Exact Equivariance During Training Enables Zero-Shot Generalization Across Symmetry Groups
Abstract: A latent world model comprising an equivariant encoder $E$ and an equivariant predictor $f$ inherits a provable symmetry within its training loss. When the world's dynamics genuinely exhibit a group $G$ acting on latents via an orthogonal representation $\rho(g)$, the relative mean squared error (relMSE) for one-step predictions remains exactly invariant across the entire group. Consequently, fitting the dynamics to a restricted slice of orientations mathematically determines the behavior across the whole orbit. We verify this end-to-end at laptop scale (CPU/MPS, fully seeded).
[A] The symmetry persists through a real Muon/AdamW + EMA + VICReg training run. The composed encode-then-predict residual reaches $\sim 10^{-6}$ after optimization, a state maintained not just at initialization but under any optimizer.
[B] One-step error remains flat to five digits across the group. In contrast, a non-equivariant baseline of the same hypothesis class fits the training slice but fails out-of-distribution, with error rates of VN $\times 1.00$ versus the baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, and $\times 157$ over the full $\mathrm{SE}(3)$ hierarchy. Additionally, the equivariant model is $4.5$-$7.4\times$ smaller.
[C] This isometry argument extends to closed-loop scenarios: under a matching equivariant planner, the control trajectory at orientation $g$ is exactly $\rho(g)$ applied to the observed one. Thus, closed-loop error is invariant across the group—achieving float-floor-exact precision in 2D/$\mathrm{SO}(2)$ on real PushT tasks and remaining statistically flat in 3D/$\mathrm{SE}(3)$ (with disjoint 95% confidence intervals). We stress-test the prior against Sutton's Bitter Lesson: while augmentation, brute-force scaling, and soft-equivariance each improve the across-group task metric to some degree, none achieve the float-floor exactness. Because equivariance is closed under composition, $H$-fold rollouts remain flat ($\times 1.00$, $\le 2\times 10^{-7}$) at every horizon, whereas the baseline's residual compounds with $H$. Out of scope: task-success sweeps, planner-free invariance, and scaling.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



