arXiv

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

June 2, 2026 · Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson · Original Source

Title: Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

Abstract: Constrained Multi-Agent Reinforcement Learning (CMARL) is hindered by two interconnected difficulties: the exponential expansion of the joint action space as agent count increases, and the complex coupling of agents imposed by constraints that go beyond simple reward structures. To tackle these issues, we propose Coordination Graphs for Constrained Multi-Agent Reinforcement Learning (CG-CMARL), a novel framework that integrates coordination graphs with Lagrangian duality. This approach breaks down the joint decision-making problem into pairwise interactions, managed by a set of shared Q-functions—one dedicated to the main objective and others to individual constraints. Consequently, the quantity of models required for learning remains constant, regardless of the number of agents. During inference, the Max-Sum message passing algorithm facilitates action coordination across the factor graph, while Lagrangian multipliers manage the balance between objectives and constraints. This mechanism enables a single trained model to explore the entire Pareto front without the need for retraining. We establish convergence guarantees under reasonable assumptions and derive a compositional error bound that isolates distinct, interpretable error sources, each linked to specific design elements and independently adjustable. Empirical evaluations on cooperative navigation scenarios, involving teams of up to 10 agents tasked with reaching target locations while adhering to pairwise constraints, demonstrate that our method generates Pareto fronts that outperform established baselines, which are typically trained at fixed reward-shaping ratios. Furthermore, the approach scales effectively to team sizes where centralized methods are computationally infeasible.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC