arXiv

A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs

Title: Assessing Positional Bias in Multi-Video Summarization via MLLMs: A Systematic Study

Abstract: While Multimodal Large Language Models (MLLMs) are gaining traction for video comprehension, their performance and reliability when processing multiple video inputs are not yet well characterized. This study investigates positional bias within the context of multi-video summarization, revealing that the fidelity of individual video summaries may fluctuate based on the video’s position in the input sequence, despite the content remaining constant. To examine this phenomenon, we developed a benchmark utilizing videos from ActivityNet and news sources, spanning Cooking, Domestic, Leisure, and News categories, with configurations involving both two and four videos. We assessed nine distinct MLLMs—comprising both open-source and proprietary models—utilizing three complementary metrics: Coverage, Directional Positional Bias (DPB), and Middle-Edge Gap (MEG). Our findings indicate that positional effects vary significantly across domains and models. Specifically, a small signed directional bias can coexist with significant underperformance of videos placed in middle positions. Furthermore, expanding the visual or generation budget does not consistently eliminate this imbalance. We also explore mitigation strategies at the prompt level. Collectively, these results demonstrate that multi-video summarization is highly sensitive to input ordering and protocol, underscoring the need for the development of more robust, order-invariant multimodal systems.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Exelon CEO Sees Daily Cybersecurity Threats
Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower
Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...