COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
Title: COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
Abstract
Integrating world models into language agents empowers them to forecast environmental dynamics and assess potential actions prior to execution. Nevertheless, current textual world models generally remain static post-training, which hinders their ability to adapt to the on-policy state-action distributions generated by an agent that is continuously improving. Additionally, methods designed to enhance agent performance frequently depend on external rewards or verification mechanisms, thereby restricting their utility in practical, interactive settings. To address these challenges, this study introduces COMAP, a new framework that facilitates the co-evolution of textual world models and agent policies through closed-loop interaction. During each decision phase, the world model generates predictions regarding future state feedback for proposed actions. The agent then engages in future-aware reflection, evaluating the credibility of this feedback and adjusting its actions to improve outcomes. These resulting on-policy trajectories are subsequently utilized to update the world model via self-distillation, enabling it to align more closely with the agent’s shifting interaction patterns. Empirical evaluations across benchmarks for embodied task planning, web navigation, and tool usage demonstrate that COMAP consistently surpasses strong baselines, achieving a relative improvement of +16.75% with Qwen3-4B. Further analysis reveals that this co-evolutionary cycle progressively enhances the world model’s prediction precision and facilitates superior long-horizon decision-making. Our code is available at: https://github.com/loyiv/CoMAP.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




