arXiv

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

June 2, 2026 · Bing-Cheng Chuang, I-Hsuan Chu, Bor-Jiun Lin, YuanFu Yang, Min Sun, Chun-Yi Lee · Original Source

Title: Challenging the Euclidean Assumption: Rectifying Geometric Errors in Vision-Language-Action Policies Through Tangent Space Score Matching

Abstract: While diffusion-driven Vision-Language-Action (VLA) policies have demonstrated significant prowess in robotic manipulation, they suffer from a critical geometric oversight we identify as the Euclidean Fallacy: the reduction of SE(3) poses into flat $\mathbb{R}^{12}$ vectors. This simplification leads to three primary issues: (1) manifold drift that breaches SO(3) constraints, (2) a loss of equivariance when coordinate systems change, and (3) non-geodesic paths that incur unnecessary kinematic costs. To address these limitations, we propose the Lie Diffuser Actor (LDA), a diffusion framework designed to operate natively on the SE(3) manifold. LDA introduces noise via left-invariant stochastic differential equations (SDEs), computes scores within the tangent space, and utilizes the exponential map for sample retraction. By design, this approach prevents manifold drift, ensures invariance to coordinate frame transformations, and achieves geodesic optimality. In evaluations on the CALVIN ABC$\rightarrow$D benchmark, LDA increased the average task length from $3.27$ to $3.51$ ($+7.3\%$). Additionally, real-world robotic experiments confirm that our method surpasses the baseline performance across most tasks.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC