arXiv

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

June 2, 2026 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin · Original Source

Title: A Simple Approach Proves Effective: Vision-Language-Action Models Serve as Natural Continual Learners in Reinforcement Learning

Abstract:

Continual Reinforcement Learning (CRL) applied to Vision-Language-Action (VLA) models represents a significant step forward in developing embodied agents capable of self-improvement within dynamic, open-ended environments. Traditionally, the field of continual learning has held that naive Sequential Fine-Tuning (Seq. FT) inevitably causes catastrophic forgetting, thereby requiring intricate CRL frameworks to mitigate this issue. In this study, we reevaluate this perspective by conducting a comprehensive analysis of CRL for large-scale pretrained VLAs across a variety of lifelong RL benchmarks.

Our findings challenge prevailing assumptions, demonstrating that straightforward Seq. FT combined with low-rank adaptation (LoRA) is highly effective. This simple approach delivers exceptional plasticity, shows minimal to no signs of forgetting, and maintains robust zero-shot generalization capabilities, often surpassing more complex CRL techniques. Detailed investigation reveals that this resilience stems from the synergistic interaction among the pretrained model’s scale, parameter-efficient adaptation mechanisms, and on-policy reinforcement learning. Collectively, these elements transform the stability-plasticity trade-off, enabling continual adaptation that is both stable and scalable. These results establish Sequential Fine-Tuning as a potent strategy for continual RL with VLAs and offer fresh perspectives on lifelong learning in the era of large models. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC