Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training
Title: Med-Scout: Addressing Geometric Blindness in Medical MLLMs Through Geometry-Aware Reinforcement Learning Post-Training
Abstract:
While Multimodal Large Language Models (MLLMs) have demonstrated remarkable linguistic capabilities in medical diagnosis, they exhibit a significant perceptual limitation: geometric blindness. This study identifies that current state-of-the-art MLLMs fail to anchor their outputs in objective geometric constraints, resulting in plausible but factually erroneous hallucinations. This issue stems from training approaches that emphasize linguistic fluency at the expense of geometric accuracy.
To resolve this, we introduce Med-Scout, a novel framework that utilizes Reinforcement Learning (RL) to tap into the inherent geometric logic present in unlabeled medical images. Rather than depending on expensive expert annotations, Med-Scout generates verifiable supervision signals through three strategic proxy tasks, which mimic the systematic reading and reasoning habits of clinicians: Anomaly Consistency Detection, Topological Jigsaw Reconstruction, and Hierarchical Scale Localization.
We also introduce Med-Scout-Bench, a dedicated benchmark designed to rigorously measure this specific perceptual deficit. Our extensive evaluations demonstrate that Med-Scout effectively alleviates geometric blindness, surpassing top proprietary and open-source MLLMs by more than 40% on our benchmark. Moreover, this improved geometric perception enhances broader medical comprehension, yielding superior performance in both radiological and comprehensive medical Visual Question Answering (VQA) tasks.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






