arXiv

Joint Agent Memory and Exploration Learning via Novelty Signals

June 2, 2026 · Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen, Guohong Liu, Yuebing Song, Jiacheng Liu, Yuchen Li, Dawei Yin, Ting Cao, Yunxin Liu, Yuanchun Li · Original Source

Title: Enhancing Agent Exploration and Memory Through Novelty-Driven Learning

Original: arXiv:2606.01528v1 Announce Type: new Abstract: In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce \textbf{J}oint \textbf{A}gent \textbf{M}emory and \textbf{E}xploration \textbf{L}earning (\textbf{JAMEL}), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.

Rewrite: In open-ended settings, autonomous agents rely heavily on exploration, a task that current language model-based agents often find challenging. While effective exploration necessitates robust memory, storing unprocessed interaction histories over extended trajectories imposes significant computational costs. Although latent memory can compress these histories, it typically suffers from a lack of dependable supervisory signals during training. To address this, we present \textbf{J}oint \textbf{A}gent \textbf{M}emory and \textbf{E}xploration \textbf{L}earning (\textbf{JAMEL}), a novel framework that simultaneously trains an agent’s memory and exploration policy via interactions driven by novelty.

Our analysis reveals a symbiotic relationship between memory and exploration: continuous exploration depends on memory to differentiate between previously attempted actions and novel opportunities, whereas interactions focused on novelty offer the necessary supervision to render the memory system effective for subsequent exploratory tasks. By leveraging consistent and deterministic novelty indicators, such as code coverage in graphical user interface (GUI) contexts, we establish a natural, annotation-free supervisory mechanism for the memory component. Experimental results show that \ours generalizes well to previously unseen environments. Furthermore, it surpasses open-weight baseline models in exploration performance and matches the exploration depth of a proprietary closed-source model, all while significantly lowering token usage. The associated code and model weights are publicly available at https://github.com/MobileLLM/JAMEL.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC