Episodic Memory Temporal Consistency for Cooperative Multi-Agent Reinforcement Learning
Title: Ensuring Temporal Consistency in Episodic Memory for Cooperative Multi-Agent Reinforcement Learning
Abstract:
Cooperative Multi-Agent Reinforcement Learning (MARL) is often hindered by significant challenges related to reward sparsity and limitations in exploration. Although episodic memory strategies help alleviate these problems by leveraging high-return trajectories, they can inadvertently cause agents to settle into local optima. This occurs because unconstrained incentive distribution and semantic representation collapse undermine performance. To overcome these obstacles, we introduce Episodic Memory Temporal Consistency (EMTC), a robust framework designed for the construction and selective utilization of historical experiences.
EMTC comprises two complementary components. First, it employs a Temporally Consistent Semantic Embedder, which combines contrastive learning with time-conditioned state reconstruction. This approach prevents representation collapse and facilitates accurate memory retrieval. Second, the framework features a Temporal Consistency Gating Mechanism that dynamically adjusts episodic incentives according to temporal consistency errors. By filtering out misleading cues from trajectories that appear successful but are flawed, this adaptive gate effectively reduces Q-value overestimation.
We establish theoretical guarantees for the framework, deriving a strict error bound that connects observable temporal consistency errors to both the quality of representations and the optimality of the underlying trajectory. Comprehensive evaluations on the GRF and SMAC benchmarks show that EMTC consistently surpasses state-of-the-art baselines. Specifically, when compared to the leading episodic baseline, EMTC yields win-rate enhancements of up to 24% in super-hard SMAC scenarios and an average gain of 28% across GRF tasks.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


