arXiv

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Title: Moving Past Static Conversations: Evaluating Realistic, Diverse, and Dynamic Long-Term Memory Systems

Abstract

Current memory benchmarks for Large Language Models (LLMs) frequently suffer from a lack of long-term semantic consistency within evaluated dialogue sessions, while the personas employed tend to be rigid and one-dimensional. Moreover, real-world interactions between users and assistants encompass a wider array of heterogeneous data streams, including emails and documents, which are largely absent in existing evaluations. These gaps significantly undermine the realism and efficacy of present-day assessment methods.

To overcome these challenges, we present RHELM (Realistic, Heterogeneous, and Evolving Long-term Memory). By leveraging carefully constructed user profiles and a novel LOOP (pLan-rOllout-evOlve-Prune) module, we generate realistic dialogues across varied interaction scenarios that feature dynamic temporal evolution and sustained long-term coherence. A key feature of this approach is the deep integration of these dialogues with heterogeneous external sources, which are synchronized with the user’s temporal event trajectory.

The resulting benchmark includes challenging question-answer pairs covering seven distinct inquiry types. Each question is mapped to at least one of 27 critical memory characteristics identified as essential but previously underexplored in current research. Extensive experiments involving full-context models, retrieval-augmented generation (RAG) techniques, and representative memory frameworks demonstrate that contemporary approaches still exhibit significant weaknesses in complex, real-world contexts, particularly regarding multi-source aggregation and contextual reasoning in practical scenarios.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...