arXiv

Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

Title: Learning Through Action: A Skill-Augmented Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

Abstract:

For Large Language Model (LLM) agents navigating dynamic, interactive settings, lifelong learning is indispensable. Nevertheless, current lifelong learning agents designed for long-horizon tasks generally rely on retrieving discrete skills or past experiences using static parameters during the inference phase. This approach hinders their ability to continuously internalize feedback received at test time, a capability characteristic of human learners. To address this limitation, we introduce Skill-enhanced Test-Time Co-Evolution (\texttt{LifeSkill}), a two-stage reinforcement learning framework tailored for Online Lifelong Learning Agents.

Our approach features Verifier-Guided Skill Learning, a mechanism designed to overcome the scarcity of direct supervision in skill extraction. By rewarding candidate skills based on the average success rate of verifiers across multiple skill-conditioned policy rollouts, this method incentivizes the model to produce skills that are genuinely effective for task resolution, rather than simply appearing plausible in textual form. Additionally, we propose Online Skill Internalization, a process that enhances the policy model during test-time interactions by converting skill-conditioned trajectories into reward signals. This strategy allows the agent to embed reasoning capabilities directly into its parameters, thereby circumventing the issue of context bloat associated with experience retrieval. Evaluations on LifelongAgentBench demonstrate that LifeSkill achieves a 7-point absolute improvement in average performance compared to existing lifelong agent baselines.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...