REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment
Title: REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment
Abstract: Knowledge-intensive Visual Question Answering (KI-VQA) is often plagued by significant knowledge conflicts, a direct consequence of the inherent constraints associated with open-domain retrieval. Current methodologies are hampered by critical shortcomings, specifically the absence of robust, generalizable mechanisms for detecting conflicts and imposing intra-model constraints to manage contradictory evidence. To overcome these hurdles, we introduce the REAL (Reasoning-Pivot Alignment) framework, which is built upon the innovative concept of the "Reasoning-Pivot." Unlike standard reasoning steps that focus primarily on internal self-derivation, a reasoning-pivot functions as an atomic component—either a node or an edge—within the reasoning chain. This component highlights knowledge linkage and typically depends on external evidence to finalize the reasoning process. Backed by the newly constructed REAL-VQA dataset, our methodology incorporates Reasoning-Pivot Aware SFT (RPA-SFT) to train a generalizable discriminator by aligning conflict identification with pivot extraction. Furthermore, we utilize Reasoning-Pivot Guided Decoding (RPGD), an intra-model decoding strategy that harnesses these pivots to mitigate conflicts in a targeted manner. Comprehensive experiments across a variety of datasets reveal that REAL substantially improves discrimination accuracy and delivers superior overall performance, thereby confirming the effectiveness of our pivot-driven resolution paradigm.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




