arXiv

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation

June 2, 2026 · Alexander Martin, William Walden, Reno Kriz, Dengjia Zhang, Kate Sanders, Eugene Yang, Chihsheng Jin, Benjamin Van Durme · Original Source

Title: Dissecting MiRAGE: A Framework for Assessing Multimodal Retrieval-Augmented Generation

Abstract: This paper presents MiRAGE, a novel evaluation framework designed for Retrieval-Augmented Generation (RAG) systems that operate on multimodal data. With audiovisual content increasingly serving as a primary source of information on the internet, it is crucial for RAG architectures to effectively incorporate insights from these diverse media types into their generated outputs. However, current evaluation methodologies remain predominantly text-focused, which restricts their effectiveness in multimodal contexts.

To address this gap, MiRAGE adopts a claim-centric strategy for assessing multimodal RAG performance. The framework comprises two key components: InfoF1, which measures factuality and the extent of information coverage, and CiteF1, which evaluates the support and completeness of citations. Our analysis reveals that when utilized by human evaluators, MiRAGE demonstrates a strong correlation with extrinsic judgments of output quality.

Furthermore, we provide an automated version of MiRAGE alongside multimodal adaptations of three widely used text-based RAG metrics: ALCE, ARGUE, and RAGAS. These experiments highlight the constraints of existing text-centric approaches and establish a foundation for automated evaluation in this domain. We have made the open-source implementations available and detail the methodologies for evaluating multimodal RAG systems.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC