MARFT: Multi-Agent Reinforcement Fine-Tuning
Title: MARFT: Multi-Agent Reinforcement Fine-Tuning
Original: arXiv:2504.16129v5 Announce Type: replace-cross Abstract: Large Language Model (LLM)-based Multi-Agent Systems (LaMAS) have demonstrated strong capabilities on complex agentic tasks requiring multifaceted reasoning and collaboration, from high-quality presentation generation to scientific research. Meanwhile, Reinforcement Learning (RL) is widely recognized for enhancing agent intelligence, but limited work has studied fine-tuning LaMAS with foundational RL techniques. Directly applying conventional Multi-Agent Reinforcement Learning (MARL) to LaMAS also introduces major challenges due to the unique mechanisms of LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce Flex-MG, a new Markov Game formulation aligned with real-world LaMAS optimization, together with a universal algorithmic framework tailored to LaMAS. We review the evolution from traditional RL to Reinforcement Fine-Tuning (RFT), then analyze the multi-agent counterpart. For LaMAS, we identify key differences between classical MARL and MARFT, including asynchronous agent interactions, profile-aware agent design, and heterogeneous architectures. These differences motivate a LaMAS-oriented formulation of RFT. We present a robust and scalable MARFT framework, detail its modular algorithm, and provide an open-source implementation to support adoption and further research. The paper further discusses application perspectives and open challenges, including dynamic environment modeling, sample inefficiency, and the lack of cohesive frameworks. By connecting theoretical foundations with practical methodology, this work aims to serve as a roadmap for advancing MARFT toward resilient, adaptive, and human-aligned agentic systems. Implementation: https://github.com/jwliao-ai/MARFT.
Rewrite: Title: MARFT: Multi-Agent Reinforcement Fine-Tuning
arXiv:2504.16129v5 Announce Type: replace-cross
Abstract: Large Language Model (LLM)-driven Multi-Agent Systems (LaMAS) have proven highly effective at handling intricate tasks that demand sophisticated reasoning and teamwork, ranging from crafting professional presentations to conducting scientific investigations. Although Reinforcement Learning (RL) is broadly acknowledged for boosting agent proficiency, there remains a scarcity of research focused on applying core RL methods to fine-tune LaMAS. Furthermore, the distinct operational mechanisms of LaMAS create significant hurdles when attempting to deploy standard Multi-Agent Reinforcement Learning (MARL) approaches. To overcome these obstacles, this study offers an in-depth examination of LLM-based MARL and introduces Multi-Agent Reinforcement Fine-Tuning (MARFT). The authors propose Flex-MG, a novel Markov Game structure designed to match the optimization needs of real-world LaMAS, alongside a versatile algorithmic framework specifically adapted for these systems. The paper traces the progression from conventional RL to Reinforcement Fine-Tuning (RFT) and explores its multi-agent equivalent. It highlights critical distinctions between classical MARL and MARFT within the context of LaMAS, such as asynchronous interactions among agents, the use of profile-aware agent designs, and the presence of heterogeneous architectures. These unique factors underscore the need for a RFT formulation specifically oriented toward LaMAS. The study outlines a resilient and scalable MARFT framework, describes its modular algorithmic components, and releases an open-source codebase to facilitate adoption and subsequent inquiry. Additionally, the authors address potential applications and unresolved issues, such as modeling dynamic environments, addressing sample inefficiency, and the current absence of unified frameworks. By bridging theoretical principles with actionable methodologies, this research intends to provide a strategic guide for developing MARFT systems that are robust, adaptable, and aligned with human values. Implementation: https://github.com/jwliao-ai/MARFT.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




