arXiv

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

June 3, 2026 · Josef Chen · Original Source

Title: AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Abstract:

While the KV-cache serves as the optimal memory solution for datacenter environments, it is ill-suited for robotic applications. Datacenter inference typically involves batching numerous short requests that are subsequently reset, allowing the attention cache to be amortized across a large user base. In contrast, embodied agents operate through single, lengthy, non-resetting episodes on edge hardware constrained by bandwidth. In these settings, high-bandwidth memory and flash storage are limited resources; flash has restricted write endurance, and memory writes often become the primary bottleneck rather than computational power.

AURA-Mem (Action-Utility Recurrent Adaptive Memory) is designed to address this specific operational regime. It encases a frozen vision-language-action backbone with a recurrent memory of fixed size and a learned gating mechanism. This gate ensures that data is written only when the current observation is likely to alter the subsequent action, effectively creating a memory system that remains silent when unnecessary. Unlike memory models based on reconstruction, this gate is trained directly against a closed-loop action-error signal.

The inference state of AURA-Mem remains constant at 4,224 bytes, irrespective of the time horizon. By comparison, a standard KV-cache expands to 6,061 times its size when processing 100,000 steps. Tests on a controlled synthetic benchmark show that AURA-Mem achieves accuracy comparable to the best O(1) baselines while requiring 5.19 to 6.13 times fewer writes. On simpler configurations, write reduction reaches up to 9.19 times. Random and periodic scheduling strategies, when matched for budget, fail to replicate these gains, confirming that the performance benefit is isolated to the action-surprise signal.

In closed-loop evaluations using a trained OpenVLA-OFT 7B model on LIBERO-Long (with n=60 episodes per arm), the gating mechanism did not compromise success rates. AURA-Mem matched the success rate of the ungated base policy (0.233) and slightly outperformed the always-write KV arm (0.217), all while utilizing 7.0 times fewer writes and maintaining constant memory usage. Additionally, an approximate-information-state value-loss bound was instantiated to demonstrate methodology; at this scale, the bound serves as a theoretical illustration rather than a strict guarantee.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC