arXiv

Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning

June 2, 2026 · Dogan Urgun, Gokhan Gungor · Original Source

Title: Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning

Abstract:

Creating effective auxiliary rewards for cooperative multi-agent systems is a significant challenge, as misaligned incentives can lead to suboptimal coordination, especially when sparse task rewards fail to provide adequate guidance for collaborative behavior. To address this, we present an autonomous reward design framework that leverages large language models (LLMs) to generate executable reward programs from environment instrumentation. Our approach restricts candidate programs to a formal validity envelope and employs Multi-Agent Proximal Policy Optimization (MAPPO) to train policies from scratch within a fixed computational budget. Candidate rewards are evaluated based on performance, with selection across generations driven exclusively by sparse task returns. We assess the framework using four Overcooked-AI layouts that feature varying degrees of corridor congestion, handoff dependencies, and structural asymmetries. Our proposed reward design method consistently improves task returns and delivery counts, with the most substantial benefits seen in environments characterized by interaction bottlenecks. Diagnostic analysis of the synthesized shaping components indicates enhanced interdependence in action selection and better signal alignment in tasks requiring intensive coordination. These findings illustrate that our LLM-guided reward search framework reduces the reliance on manual engineering while generating shaping signals that are compatible with cooperative learning under limited computational resources.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC