arXiv

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

June 4, 2026 · Qingxu Fu, Boyin Liu, Shuchang Tao, Zhaoyang Liu, Bolin Ding · Original Source

Title: AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

Abstract:

This paper introduces AgentJet, a distributed framework designed for the reinforcement learning of large language model (LLM) agents. Moving away from the rigid coupling of agent rollouts and model optimization found in centralized systems, AgentJet utilizes a decoupled multi-node architecture. In this setup, swarm server nodes manage trainable models and handle GPU-based optimization, while swarm client nodes execute various agents across diverse devices. This structural separation enables several capabilities that are challenging to implement in centralized frameworks:

Heterogeneous Multi-Model Reinforcement Learning: It supports the training of multi-agent teams where different agents utilize distinct LLMs as their core reasoning engines.
Multi-Task Cocktail Training: It facilitates concurrent training of multiple tasks while maintaining isolated runtimes for each agent.
Fault-Tolerant Execution: The system ensures that failures in external environments do not disrupt the ongoing training process.
Live Code Iteration: Agents can be modified during training by simply replacing swarm client nodes.

To enhance efficiency in complex settings involving multiple models, turns, and agents, AgentJet incorporates a context tracking module featuring timeline merging. This component consolidates redundant context, resulting in a training speedup ranging from 1.5x to 10x. Additionally, the framework includes an automated research system capable of initiating long-horizon, multi-day RL studies on large-scale clusters based on a provided research topic. By employing the swarm architecture, this system autonomously replicates the exploratory workflows typically performed by RL researchers, operating without human intervention during execution.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC