MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models
Title: MIND: A Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models
Abstract:
Multimodal large language models (MLLMs) have seen widespread adoption in reasoning applications. Nevertheless, these models often struggle with inadequate logical resilience, a lack of robust multi-rationale semantic modeling, and a vulnerability to distracting signals. To address these challenges, we introduce the Multi-rationale INtegrated Discriminative (MIND) reasoning framework. This approach aims to equip MLLMs with cognitive capabilities akin to human processes—specifically "Understand, Rethink, and Correct"—thereby shifting the paradigm from passive, imitation-based reasoning to active, discriminative reasoning.
Our methodology incorporates several key components. First, we establish a Rationale Augmentation and Discrimination (RAD) paradigm to serve as a unified and scalable data foundation. Second, we implement a Progressive Two-stage Correction Learning (P2CL) strategy. The initial stage focuses on enhancing positive learning across multiple rationales, while the subsequent stage facilitates active logical discrimination and correction. Furthermore, to reduce representation entanglement within the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy.
Comprehensive experimental results demonstrate that MIND delivers state-of-the-art (SOTA) performance across various public datasets. The associated data and code are publicly accessible at https://github.com/YuChuang1205/MIND.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



