arXiv

Scaling Self-Evolving Agents via Parametric Memory

June 4, 2026 · Tao Ren, Weiyao Luo, Hui Yang, Rongzhi Zhu, Xiang Huang, Yuchuan Wu, Bingxue Chou, Jieping Ye, Jiafeng Liang, Yongbin Li, Yijie Peng · Original Source

Title: Scaling Self-Evolving Agents via Parametric Memory

Abstract: Current memory-enhanced LLM agents typically restrict past experiences to the prompt space, utilizing either textual summaries or retrieved passages, while maintaining frozen model parameters during a rollout. Consequently, these systems are limited to looking up prior information rather than learning from it; their decision-making policy remains static regardless of experience, and any details omitted from the context window are lost forever. To address this, we present TMEM, a self-evolving parametric memory framework that enables agents to not only condense history into explicit memory structures but also integrate distilled supervision into fast LoRA weights ($\Delta_t$) through lightweight online updates. This mechanism genuinely modifies the agent’s future behavior within a single episode. We formalize this approach as an agentic decision process characterized by fast-weight rollout dynamics, where actions are drawn from $\pi_{\theta_0+\Delta_t}$, and extraction actions generate supervision signals that update $\Delta_t$ for subsequent decisions. This perspective renders the extraction policy directly optimizable via reinforcement learning: training the base parameters $\theta_0$ enhances not only task-specific actions but also the quality of the data utilized for online LoRA adaptation. Additionally, we introduce SVD-based initialization for the LoRA subspace to expedite online convergence. Evaluations on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench demonstrate that TMEM consistently surpasses summary-based and retrieval-based baselines across various model scales.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC