arXiv

M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

Title: M$^3$Eval: Evaluating Multi-Modal Memory via Cognitively-Inspired Video Tasks

As multi-modal models increasingly tackle the complexities of long-form video comprehension, memory has become a pivotal capability. Although significant progress has been made in creating video datasets and benchmarks, current research largely prioritizes perception and reasoning, neglecting a systematic assessment of memory. Specifically, there is a lack of inquiry into what information models retain, the fidelity of that preservation, and the robustness of memory when subjected to interference.

To bridge this gap, we present M$^3$Eval, the inaugural comprehensive framework and benchmark designed to probe various dimensions of memory within multi-modal models. Drawing upon principles from cognitive psychology, our approach utilizes meticulously crafted tasks that isolate specific memory components. By applying M$^3$Eval, we performed extensive experiments on a range of representative multi-modal models, uncovering consistent vulnerabilities and unique operational behaviors.

Our analysis reveals several key findings: models face difficulties in maintaining separate representations while processing concurrent video streams; they exhibit interference patterns that diverge significantly from human memory dynamics; they anchor memory sources more accurately in spatial contexts than in temporal ones; and they show constrained capabilities in symbolic memory. Together, this benchmark serves as a crucial asset for subsequent research. Our results underscore memory as a foundational but under-researched ability, providing essential insights for the development of superior memory mechanisms in multi-modal systems. The code and dataset are accessible at https://pku-value-lab.github.io/m3eval-homepage.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive
Bloomberg

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive

Microsoft AI CEO Mustafa Suleyman criticized Anthropic’s models as too expensive. Meanwhile, Microsoft plans to allow us...

Ramp Notches $44 Billion Valuation in New Funding Round
Bloomberg

Ramp Notches $44 Billion Valuation in New Funding Round

RAMP secured a $44 billion valuation in its latest funding round. CEO Eric Glyman attended the 2026 Reagan National Econ...

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...