arXiv

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

Title: SuperMemory-VQA: A Benchmark for Long-Horizon Egocentric Visual Question Answering

Abstract:

AI-enabled eyewear offers a promising foundation for deploying artificial intelligence as personalized memory aids. For these systems to deliver genuine utility, they must transcend the analysis of brief video segments and instead bridge memory gaps that arise in practical, personal, or social contexts over extended periods of egocentric footage. Currently, most egocentric datasets prioritize action recognition or generic question-answering derived from short clips, thereby assessing perceptual skills rather than addressing the complex memory requirements of humans. To fill this void, we present SuperMemory-VQA, a new benchmark designed to evaluate AI assistants on practical, long-horizon memory challenges.

The dataset comprises 52.9 hours of daily activities captured via AI glasses, featuring synchronized data streams that include RGB video, audio transcriptions, eye-tracking metrics, IMU readings, and SLAM trajectories. Utilizing a rigorous, human-verified annotation process, we developed 4,853 grounded question-answer pairs. These items cover a diverse range of memory types, including object and location retention, intent and visual scene recall, timeline reconstruction, conversational history, and in-context retrieval. To assess resilience against hallucinations, every question is formatted as a multiple-choice item that includes a distinct "unanswerable" option.

Our benchmarking of state-of-the-art agentic frameworks and large language model backbones indicates that current systems are still significantly lacking in reliability when applied to real-world memory tasks. This gap underscores the necessity for novel architectures grounded in AI memory, which can restrict responses to situations where sufficient evidence exists. Furthermore, feedback from a participant survey confirms that the benchmark questions are realistic, useful, and well-aligned with the memory demands of everyday life.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...