arXiv

Answer Self-Consistency with Margin-Triggered Question Re-Arbitration for the CVPR 2026 VidLLMs Challenge

Title: Leveraging Answer Self-Consistency and Margin-Triggered Re-Arbitration for the CVPR 2026 VidLLMs Challenge

This report outlines our approach to Track 2 of the CVPR 2026 VidLLMs Challenge, a competition focused on assessing visual relational reasoning in video content. The primary objective for participants is to enable models to deduce relationships that are not immediately or explicitly apparent within the visual data. To address this, we introduce Answer Self-Consistency with Margin-Triggered Question Re-Arbitration (ASC-MQRA), a novel training-free test-time reasoning framework grounded in a multimodal reasoning model.

The foundational element, ASC, enhances performance by executing multiple stochastic runs of video question-answering tasks. By aggregating the resulting answer choices through answer-level self-consistency, this method significantly outperforms standard single-pass inference, establishing it as the core of our final test submission.

We also investigate MQRA, a conditional module designed to re-arbitrate questions where initial results indicate uncertainty. This is identified through a low-margin vote distribution. Our analysis reveals that examples with low margins frequently retain the ground-truth answer within their top candidates. This insight motivates MQRA to refine the candidate set and prompt the model to re-examine only the video segments associated with these retained options. While MQRA demonstrated further improvements over ASC during validation—suggesting that low-margin vote distributions serve as an effective uncertainty signal—it led to a slight performance decline on the test set. This degradation implies that the re-arbitration process is highly sensitive to the specific size and category distribution of the subset triggering the re-evaluation.

Consequently, our definitive test submission relies solely on the ASC framework without the additional re-arbitration step. This strategy yielded an average accuracy of 72.73 and a category-wise macro average accuracy of 78.34 on the validation set. On the test set, the model achieved an average accuracy of 81.16 and a category-wise macro average accuracy of 80.91. This document provides a comprehensive overview of our prompting methodology, implementation details, ablation studies, and diagnostic analyses. The source code for this project can be accessed at https://github.com/data-analytics-labo/ASC-MQRA.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...