arXiv

Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents

June 2, 2026 · Wenhao Li, Xiangfeng Wang, Bo Jin · Original Source

Title: MF-Diffuser: Enabling Offline Multi-Agent Reinforcement Learning at Scale for Thousands of Agents

Abstract: While diffusion-based planning has demonstrated significant success in single-agent offline reinforcement learning, extending these methods to many-agent systems remains a formidable challenge due to the exponential complexity inherent in the joint trajectory space. To address this, we present MF-Diffuser, a novel framework that reformulates trajectory planning within the Wasserstein space of trajectory distributions. By leveraging the principle of propagation of chaos, this approach allows a limited, representative subset of agents to accurately model the dynamics of the entire population. Our method incorporates a value-weighted chaotic entropy objective, which effectively balances the generation of high-fidelity trajectories with the maximization of returns. Additionally, we employ a hierarchical coarse-to-fine strategy that incrementally expands the agent population throughout the denoising process. We derive end-to-end suboptimality bounds consisting of four distinct, interpretable components. These bounds demonstrate that the mean-field approximation error decreases at a rate of $O(H^2/\sqrt{N})$, while offline distribution shift remains independent of the population size $N$. Furthermore, we provide explicit convergence guarantees proving that the resulting policy constitutes an approximate mean-field Nash equilibrium. Empirical evaluations across three standard mean-field RL benchmarks—including stage games, sequential dynamics, and adversarial team competitions—indicate that MF-Diffuser secures the highest returns in most scenarios. The most substantial performance improvements are observed when working with suboptimal offline data and at extreme scales where the number of agents $N$ exceeds $10^3$.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC