ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
Title: ReSkill: Aligning Skill Generation with Policy Optimization in Agentic RL
Abstract:
While Agentic Reinforcement Learning (RL) allows Large Language Model (LLM) agents to refine their behavior through continuous environmental feedback, the resulting policies often fail to systematically build a repository of reusable strategies that can be applied across diverse tasks. Although modular skills offer a solution by providing such generalized strategies, current skill-enhanced RL approaches typically separate skill development from policy optimization. This decoupling creates a risk where adopted skills may contradict the agent’s evolving policy. Drawing inspiration from Anthropic’s Skill Creator, we present ReSkill, an RL-in-the-loop framework designed to harmonize the evolution of skills with policy learning. ReSkill leverages the group-wise structure of GRPO to integrate three key mechanisms with minimal computational overhead: (1) an assertion-driven skill creator that analyzes past failures to propose conditional, trigger-based updates to skills; (2) within-group rollout sampling, which facilitates controlled comparisons between different skill versions to identify the one most beneficial for the policy’s current learning stage; and (3) Thompson Sampling with adaptive discounting, which manages the trade-off between exploration and exploitation when selecting skill versions as the policy develops. Experimental results across multiple domains demonstrate that ReSkill consistently surpasses existing memory-based and skill-based RL methods, achieving its most significant improvements on tasks not seen during training. Furthermore, an analysis of the skill lifecycle reveals that skills are automatically generated, evaluated, refined, and discarded in tandem with policy improvements, effectively demonstrating a reconciled co-evolution of skills and policies.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




