Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning
Title: Moving Past Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning
Abstract:
Group-based reinforcement learning (RL) techniques have proven highly effective in enhancing the capabilities of large language models (LLMs), with applications quickly expanding into agentic domains. Nevertheless, these methods traditionally depend on coarse-grained, trajectory-level attribution tied to final outcomes. This approach struggles to isolate the impact of individual steps, particularly when beneficial actions are hidden within trajectories that ultimately fail. To address this limitation and facilitate more accurate step-level credit assignment by revealing latent information, we introduce Graph-based Group Policy Optimization (GraphGPO). GraphGPO operates by consolidating all rollout trajectories into a single state-transition graph. It then leverages the global information embedded within this structure to calculate the distance from each state to the task goal. Consequently, the method assigns value to each edge by computing a graph-based advantage, determined by the extent to which a specific transition shortens the distance to the goal. Through this mechanism, GraphGPO markedly boosts training efficiency and delivers state-of-the-art results across various difficult benchmarks.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





