PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models
Title: PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models
Abstract:
Vision-Language-Action (VLA) models have demonstrated significant proficiency in language-guided robotic manipulation. Nevertheless, implementing these systems in open-ended settings necessitates the ongoing acquisition of new competencies, a requirement that frequently induces severe catastrophic forgetting of previously mastered behaviors. Although experience replay (ER) is a conventional countermeasure, simple uniform sampling fails to align with the temporal dynamics of manipulation sequences. This approach systematically under-represents brief yet causally vital sub-skills, resulting in phase starvation, while simultaneously ignoring the disparate rates of forgetting associated with different historical tasks.
To address these challenges, we present PHASER, a continual learning framework that remains agnostic to specific architectures. PHASER utilizes a phase-centric capacity allocation mechanism to ensure that all sub-skills receive equitable memory support. Additionally, it incorporates a multi-modal interference routing strategy designed to dynamically prioritize historical phases that are most susceptible to forgetting. To facilitate fully autonomous lifelong adaptation, we embed Auto-PC, a streamlined pipeline that merges unsupervised change-point detection for action signals with semantic verification via Vision-Language Models (VLMs). This integration allows for the extraction of temporal boundaries without the need for extensive manual labeling.
Performance evaluations across three distinct VLA backbones using the LIBERO continual learning suites reveal substantial empirical gains. PHASER improves the Average Success Rate (ASR) by as much as 31% compared to matched-budget ER methods, ultimately securing an 87.8% final ASR in the LIBERO-Goal CL configuration.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



