Global News Digest

arXiv

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

Title: Addressing Information Self-Locking in Reinforcement Learning for Active Reasoning in LLM Agents

Abstract:

Reinforcement learning (RL) has emerged as the standard framework for developing LLM-based agents capable of acting, interacting, and reasoning across extended task durations. However, our investigation into active reasoning—where agents must actively seek new observations by interacting with their environment to accomplish tasks—reveals a critical flaw in outcome-based RL. We identify a systematic failure mode termed "information self-locking" (SeL), wherein agents struggle to generate informative feedback and fail to integrate the evidence they do obtain.

To dissect this problem, we deconstruct agentic behavior into two interdependent capabilities: Action Selection (AS), which governs the generation of observation streams, and Belief Tracking (BT), which refines the agent’s internal understanding of the task. Both theoretical frameworks and empirical data point to a bidirectional bottleneck that precipitates SeL. Specifically, deficient BT masks the value of informative actions, while inadequate AS denies BT access to crucial evidence. This reciprocal weakness diminishes the learning signal for both components, ultimately resulting in information self-locking.

To address this challenge, we introduce AREW, a straightforward yet powerful Advantage Reweighting technique. AREW leverages readily available directional critiques to redistribute credit throughout trajectories. Our extensive evaluations across nine agentic tasks of diverse complexity demonstrate that AREW effectively alleviates SeL, driving performance improvements of up to 60 points. The source code is accessible at https://github.com/unimpor/T3.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers ā€œas much as possible,ā€ emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.