arXiv

Multi$^2$: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments

June 3, 2026 · Sangeun Park, Minhae Kwon · Original Source

Title: Multi$^2$: Enhancing Hierarchical Multi-Agent Decision-Making in Interactive Settings via LLM-Based Agents

Abstract: A primary objective in large language model (LLM) research is the development of agentic systems capable of planning, executing actions, and adapting through continuous engagement with dynamic environments. Although contemporary LLM-based agents demonstrate remarkable contextual reasoning capabilities, their decision-making over extended horizons remains unstable, frequently encountering "objective drift," a phenomenon where goals and plans diverge during prolonged interactions. To address this, we present Multi$^2$, a hierarchical multi-agent framework that structurally decomposes agent behavior into distinct, complementary roles. In this architecture, a high-level agent (System 1) generates context-aware sub-goils via supervised fine-tuning (SFT), while a low-level agent (System 2) performs atomic actions using offline-to-online reinforcement learning (RL) within interactive settings. This architectural separation facilitates stable long-horizon control, reduces the risk of objective drift, and supports efficient adaptation. Empirical results across various interactive environments show that Multi$^2$ consistently surpasses robust agentic baselines, exhibiting superior coordination and robustness in multi-turn interactions. Furthermore, to address a significant gap in the training and evaluation of hierarchical decision-making for LLM-based agents, we introduce and make publicly available three new hierarchical benchmark datasets.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC