arXiv

Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning

June 2, 2026 · Mingyue Cheng, Shuo Yu, Daoyu Wang, Qingchuan Li, Xiaoyu Tao, Jie Ouyang, Yucong Luo, Yitong Zhou, Qi Liu, Enhong Chen · Original Source

Title: Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning

Abstract:

Large language models (LLMs) have undergone a rapid transformation, evolving from simple single-turn text generators into the core infrastructure for increasingly sophisticated agents. As these agents assume roles requiring complex reasoning, decision-making, tool utilization, and long-horizon task execution, reinforcement learning (RL) has become pivotal in shaping their behavioral patterns. This trend is particularly pronounced in agentic RL, where models must engage with tools and environments through multiple rounds of interaction, rather than generating isolated, standalone responses.

In this multi-turn context, the traditional perspective of a trajectory as a continuously expanding token sequence proves insufficient. This conventional approach imposes rigidity on context evolution and generates representation mismatches between rollout and training phases. To address these challenges, we introduce Agent-R1, a unified and modular framework designed for agentic RL. Built upon step-level trajectory representation, adaptable context management, and layered interfaces for workflows, environments, and optimization, the framework offers a robust solution.

The central premise of Agent-R1 is to regard each interaction step as the fundamental reinforcement-learning transition. Crucially, the optimization layer remains highly flexible; by modeling interactions at the step level, the framework accommodates various designs, including token-level credit assignment, step-level credit assignment, or other compatible mechanisms. This architectural flexibility ensures compatibility with a diverse array of optimization strategies, avoiding reliance on any single algorithm. Collectively, these components establish a principled, extensible, and reusable foundation for agentic RL.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC