AttenA+: Rectifying Action Inequality in Robotic Foundation Models
Title: AttenA+: Correcting Action Inequality in Robotic Foundation Models
Abstract:
Current robotic foundation models, despite their impressive capabilities, rely on an implicit assumption of temporal homogeneity. By treating every action as equally informative during optimization, these models adopt a "flat" training paradigm inherited from language modeling, which ignores the underlying physical hierarchy of manipulation. In practice, robot trajectories are inherently heterogeneous. Low-velocity segments, which require precision for critical interactions, often determine task success, whereas high-velocity motions function as error-tolerant transitions. This disconnect between uniform loss weighting and physical significance fundamentally restricts the performance of Vision-Language-Action (VLA) and World-Action Models (WAM) in complex, long-horizon scenarios.
To address this issue, we propose AttenA+, a framework that is agnostic to architecture and prioritizes kinematically critical segments through velocity-driven action attention. AttenA+ aligns the model’s learning capacity with the physical demands of manipulation by reweighting the training objective according to the inverse velocity field. As a modular, plug-and-play enhancement, it can be incorporated into existing backbones without requiring structural changes or additional parameters.
Extensive experiments reveal that AttenA+ substantially raises the performance limits of state-of-the-art models. Specifically, it boosts OpenVLA-OFT to 98.6% (+1.5%) on the Libero benchmark and elevates FastWAM to 92.4% (+0.6%) on RoboTwin 2.0. Further validation on a real-world Franka manipulator demonstrates its robustness and ability to generalize across tasks. Our findings indicate that leveraging the intrinsic structural priors of action sequences provides a highly efficient, physics-aware alternative to standard scaling laws, opening new avenues for general-purpose robotic control.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





