arXiv

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

Title: MBench: A Holistic Benchmark for Assessing Memory in Video World Models

Abstract:

While recent breakthroughs in video-based world models have showcased an exceptional capacity to generate high-fidelity visual sequences, a significant disconnect remains between producing visually plausible content and meeting the functional demands of a true world model. Specifically, maintaining a stable and logical internal state over extended periods remains a challenge. Current evaluation frameworks predominantly focus on visual aesthetics, motion smoothness, and alignment between text and video, often neglecting memory—the essential function that allows a world model to uphold consistency across long timeframes and intricate interactions.

To bridge this oversight, we introduce MBench, a specialized benchmark designed to measure and assess the memory capabilities of video world models. We break down memory into three hierarchical, complementary dimensions: entity consistency, environment consistency, and causal consistency. These core areas are further subdivided into 12 quantifiable metrics to provide a thorough characterization of long-term memory retention. The benchmark relies on carefully curated, real-world long-form video data and utilizes both rule-based quantitative matrices and Vision-Language Models (VLMs) to ensure objective and comprehensive consistency evaluation. Our extensive testing of leading state-of-the-art video world models exposes profound systemic weaknesses in long-term state retention, offering the community a standardized evaluation tool and a clear pathway for future research advancement.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...