X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding
Title: X-Stream: Investigating MLLMs as Multiplexers for Multi-Stream Comprehension
Abstract: Although progress in video streaming analysis has been substantial, practical applications like live sports coverage, self-driving vehicles, and collaborative multi-screen setups require continuous engagement with multiple data streams simultaneously. Yet, current evaluation frameworks are restricted to single-stream models, creating a significant void in assessing real-time, cross-stream reasoning capabilities. To address this deficiency, we present X-Stream, the inaugural benchmark designed specifically for multi-stream streaming comprehension. This resource features 4,220 carefully selected question-and-answer pairs derived from 932 videos, assessing performance across 11 distinct subtasks within multi-window, multi-view, and multi-device contexts. A key innovation is our dataset’s construction via a novel dual-verification process, which mitigates the risk of models depending too heavily on any single stream. Additionally, we introduce the concept of treating multi-modal large language models (MLLMs) as basic multiplexers, analyzing their efficacy through the framework of Signal Multiplexing Theory. Our comprehensive online inference tests highlight a pressing issue: leading MLLMs face considerable challenges with simultaneous streams, scoring merely around 50% and demonstrating weak proactive capabilities. By revealing the limitations inherent in present multiplexing approaches, X-Stream offers both a robust evaluation methodology and empirical insights to guide the development of next-generation multi-stream agents.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





