arXiv

Episodic Memory Temporal Consistency for Cooperative Multi-Agent Reinforcement Learning

June 4, 2026 · Zicheng Zhao, Yu Lan, Chengzhengxu Li, Zhaohan Zhang, Xiaoming Liu · Original Source

Title: Ensuring Temporal Consistency in Episodic Memory for Cooperative Multi-Agent Reinforcement Learning

Abstract:

Cooperative Multi-Agent Reinforcement Learning (MARL) is often hindered by significant challenges related to reward sparsity and limitations in exploration. Although episodic memory strategies help alleviate these problems by leveraging high-return trajectories, they can inadvertently cause agents to settle into local optima. This occurs because unconstrained incentive distribution and semantic representation collapse undermine performance. To overcome these obstacles, we introduce Episodic Memory Temporal Consistency (EMTC), a robust framework designed for the construction and selective utilization of historical experiences.

EMTC comprises two complementary components. First, it employs a Temporally Consistent Semantic Embedder, which combines contrastive learning with time-conditioned state reconstruction. This approach prevents representation collapse and facilitates accurate memory retrieval. Second, the framework features a Temporal Consistency Gating Mechanism that dynamically adjusts episodic incentives according to temporal consistency errors. By filtering out misleading cues from trajectories that appear successful but are flawed, this adaptive gate effectively reduces Q-value overestimation.

We establish theoretical guarantees for the framework, deriving a strict error bound that connects observable temporal consistency errors to both the quality of representations and the optimality of the underlying trajectory. Comprehensive evaluations on the GRF and SMAC benchmarks show that EMTC consistently surpasses state-of-the-art baselines. Specifically, when compared to the leading episodic baseline, EMTC yields win-rate enhancements of up to 24% in super-hard SMAC scenarios and an average gain of 28% across GRF tasks.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Top international news

Episodic Memory Temporal Consistency for Cooperative Multi-Agent Reinforcement Learning

Related Articles

Meta’s Oversight Board says account bans lack due process, transparency

Fed's Daly Says Forward Guidance Could Be Misleading

Meta rolls out a new AI creator assistant on Facebook

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

Goldman Sachs CEO David Solomon on the Coming Mega IPOs