Global News Digest

arXiv

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL

Title: ReSkill: Aligning Skill Generation with Policy Optimization in Agentic RL

Abstract:

While Agentic Reinforcement Learning (RL) allows Large Language Model (LLM) agents to refine their behavior through continuous environmental feedback, the resulting policies often fail to systematically build a repository of reusable strategies that can be applied across diverse tasks. Although modular skills offer a solution by providing such generalized strategies, current skill-enhanced RL approaches typically separate skill development from policy optimization. This decoupling creates a risk where adopted skills may contradict the agent’s evolving policy. Drawing inspiration from Anthropic’s Skill Creator, we present ReSkill, an RL-in-the-loop framework designed to harmonize the evolution of skills with policy learning. ReSkill leverages the group-wise structure of GRPO to integrate three key mechanisms with minimal computational overhead: (1) an assertion-driven skill creator that analyzes past failures to propose conditional, trigger-based updates to skills; (2) within-group rollout sampling, which facilitates controlled comparisons between different skill versions to identify the one most beneficial for the policy’s current learning stage; and (3) Thompson Sampling with adaptive discounting, which manages the trade-off between exploration and exploitation when selecting skill versions as the policy develops. Experimental results across multiple domains demonstrate that ReSkill consistently surpasses existing memory-based and skill-based RL methods, achieving its most significant improvements on tasks not seen during training. Furthermore, an analysis of the skill lifecycle reveals that skills are automatically generated, evaluated, refined, and discarded in tandem with policy improvements, effectively demonstrating a reconciled co-evolution of skills and policies.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.