arXiv

Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

June 2, 2026 · Yun Qu, Boyuan Wang, Yuhang Jiang, Jianzhun Shao, Yixiu Mao, Heming Zou, Chang Liu, Cheems Wang, Meiqin Liu, Xiangyang Ji · Original Source

Title: Directing the Path: LLMs Identify Critical States to Streamline Multi-Agent Exploration

Abstract:

The presence of vast state-action spaces continues to present a persistent hurdle for efficient multi-agent exploration within reinforcement learning. While recent research has increasingly focused on driving agents toward novelty, diversity, or uncertainty, the practical efficiency of the field is hampered by the redundant efforts inherent in unguided exploration. To address this, we present LEMAE, a systematic framework for Efficient Multi-Agent Exploration that leverages informative, task-relevant guidance derived from Large Language Models (LLMs). Our method translates linguistic knowledge from LLMs into symbolic key states—essential milestones for task completion—through a discriminative approach that minimizes inference costs. To maximize the utility of these key states, we introduce the Subspace-based Hindsight Intrinsic Reward (SHIR), which steers agents toward these states by enhancing reward density. Furthermore, we implement the Key State Memory Tree (KSMT) to monitor transitions between key states within specific tasks, thereby facilitating organized exploration. By significantly reducing redundant exploration activities, LEMAE surpasses current state-of-the-art methods on demanding benchmarks such as SMAC and MPE, delivering substantial performance gains and achieving up to a 10x speedup in certain scenarios.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC