Global News Digest

arXiv

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

Title: SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

Abstract: While long-horizon Large Language Model (LLM) agents stand to gain significantly from the utilization of reusable skills, current skill-based approaches often necessitate external skill generators during the training phase or require continuous skill retrieval during inference. These dependencies introduce heightened engineering complexity, expand context windows, and increase deployment latency. To address these challenges, we introduce Self-Internalizing Reinforcement learning with Intrinsic skills (SIRI), a novel three-phase framework designed to enable agents to discover, validate, and internalize skills autonomously, eliminating the need for external skill generators or inference-time skill repositories.

The SIRI methodology begins by warming up the policy via GiGPO to establish fundamental interaction capabilities and gather successful trajectories devoid of explicit skills. Subsequently, the framework engages in self-skill mining: the current policy extracts compact skills from its own successful plain rollouts and validates their efficacy by comparing paired skill-augmented and skill-free trajectories. In the final phase, SIRI distills only those action tokens guided by beneficial skills into the plain policy, leveraging both trajectory-level utility and action-level advantage metrics. Consequently, at inference time, the agent operates using only the original prompt.

Empirical evaluations on the ALFWorld and WebShop benchmarks, utilizing Qwen2.5-7B-Instruct, demonstrate that SIRI enhances GiGPO’s performance, raising scores from 0.908 to 0.930 on ALFWorld and from 0.728 to 0.813 on WebShop. These results surpass various baselines, including prompt-based, RL-based, and memory-augmented methods. Additional analysis indicates that our self-mining strategy delivers performance metrics comparable to distillation techniques employing closed-source large models. The source code for this work is publicly accessible at https://github.com/kirito618/SIRI.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.