General Covariant Action Modeling: Constructing Generalized Manifolds via Spatio-Temporal Decoupling
Title: General Covariant Action Modeling: Constructing Generalized Manifolds via Spatio-Temporal Decoupling
Abstract:
Attaining robust generalization from scarce data remains a pivotal hurdle in the field of embodied intelligence. Current approaches typically falter by relying on the regression of absolute coordinates, a methodology that breaches the principle of general covariance. At its core, this conventional strategy mistakenly merges intrinsic task geometry with rigid execution protocols, thereby tethering policies to particular motion styles and constant speeds. To address this limitation, we introduce the Generalized Action Manifold (GAM) framework, which enforces general covariance through structural disentanglement. GAM constructs this manifold by imposing invariance across two perpendicular dimensions: first, Temporal Invariance, achieved via an Arc-Length Parameterizer that separates spatial path geometry from temporal dynamics, thus guaranteeing resilience against velocity fluctuations; and second, Geometric Invariance, facilitated by a Schema-Affine-Factorization mechanism that projects trajectories into canonical “world lines” within a pose-normalized coordinate system. This process isolates invariant geometric schemas from affine modulations, thereby securing spatial generalizability. When embedded within a structured Vision-Language-Action (VLA) architecture, GAM allows sparse demonstrations to densely fill a continuous, valid action manifold. Our empirical findings indicate that GAM delivers enhanced transfer performance and robustness, surpassing geometry-agnostic baseline methods.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





