On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
Title: Addressing Information Self-Locking in Reinforcement Learning for Active Reasoning in LLM Agents
Abstract:
Reinforcement learning (RL) has emerged as the standard framework for developing LLM-based agents capable of acting, interacting, and reasoning across extended task durations. However, our investigation into active reasoningāwhere agents must actively seek new observations by interacting with their environment to accomplish tasksāreveals a critical flaw in outcome-based RL. We identify a systematic failure mode termed "information self-locking" (SeL), wherein agents struggle to generate informative feedback and fail to integrate the evidence they do obtain.
To dissect this problem, we deconstruct agentic behavior into two interdependent capabilities: Action Selection (AS), which governs the generation of observation streams, and Belief Tracking (BT), which refines the agentās internal understanding of the task. Both theoretical frameworks and empirical data point to a bidirectional bottleneck that precipitates SeL. Specifically, deficient BT masks the value of informative actions, while inadequate AS denies BT access to crucial evidence. This reciprocal weakness diminishes the learning signal for both components, ultimately resulting in information self-locking.
To address this challenge, we introduce AREW, a straightforward yet powerful Advantage Reweighting technique. AREW leverages readily available directional critiques to redistribute credit throughout trajectories. Our extensive evaluations across nine agentic tasks of diverse complexity demonstrate that AREW effectively alleviates SeL, driving performance improvements of up to 60 points. The source code is accessible at https://github.com/unimpor/T3.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




