arXiv

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

June 4, 2026 · Jingwen Chen, Wenkai Yang, Shengda Fan, Wenbo Nie, Chenxing Sun, Shaodong Zheng, Yangen Hu, Lu Pan, Ke Zeng, Yankai Lin · Original Source

Title: Reevaluating Continual Experience Internalization for Self-Evolving LLM Agents

Abstract

Transforming contextual insights from historical interactions into reusable parametric knowledge represents a promising avenue for enabling continual learning in large language models (LLMs). Although previous research has largely concentrated on single-iteration transfer mechanisms, our investigation reveals a critical flaw: under multi-iteration learning scenarios, current methods exhibit a progressive decline in capability rather than the expected compounding gains. To understand this phenomenon, we conduct a systematic analysis across three essential dimensions of experience internalization.

First, regarding Experience Granularity, we determine that principle-level experiences are more enduring than instance-level ones. This durability stems from the ability of principle-level data to abstract transferable strategies, effectively stripping away details specific to particular trajectories.

Second, in terms of the Experience Injection Pattern, our findings indicate that step-wise injection is superior to global injection. By synchronizing experience with intermediate decision states, step-wise injection proves particularly vital for tasks requiring long-horizon tool utilization.

Third, concerning the Internalization Regime, we show that off-policy context-distillation, when applied to high-quality teacher trajectories, offers a significantly more stable training signal. This contrasts with on-policy context-distillation, which is constrained by local corrections performed on flawed states generated by the student.

Collectively, these insights establish a straightforward yet resilient framework for stable and sustainable experience internalization, offering practical directions for the development of LLMs capable of self-evolution and continual learning.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC