Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations
Title: Momento: Assessing Persistent Memory and Reasoning in Multi-Session Agentic Interactions
Abstract:
While recent breakthroughs in agentic AI have empowered systems to tackle intricate tasks via reasoning, tool utilization, and multi-step planning, current evaluation methods remain limited. Existing benchmarks typically confine agents to single-session contexts, thereby overlooking critical elements such as historical actions, expressed preferences, and previous decisions that are essential for addressing personalized user objectives.
To address this limitation, we present Momento, a novel benchmark designed to evaluate persistent agentic task completion within multi-session service settings. This framework demands that agents execute consequential, tool-mediated actions while managing temporal dependencies and adapting to shifting user goals across different sessions. Our experimental findings indicate that contemporary agents primarily stumble due to inaccurate assessments of user state. Specifically, these models often erroneously treat prior session history as a trustworthy indicator of current context, rather than recognizing it as outdated data that necessitates re-validation. This discrepancy underscores a significant divide between the present capabilities of AI agents and the requirements of realistic, long-horizon human-agent interactions.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





