Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA
Title: Ask4VG: Mitigating Prior-Driven Responses in Medical VQA Through Risk-Aware Question Selection
Abstract: In medical visual question answering (VQA), it is critical for models to base their responses on concrete visual evidence, as answers lacking such support can lead to misleading interpretations. However, the prevalence of generic, template-based, or structurally similar questions often encourages models to rely on question-answer shortcuts rather than engaging in image-dependent reasoning. This tendency heightens the likelihood of hallucinated outputs. To address this, we introduce Ask4VG, a label-free pilot framework designed for risk-aware question selection. Ask4VG quantifies the hallucination risk associated with specific questions via counterfactual visual probing. This process involves querying the model with the same question against four distinct image conditions: the original image, a perturbed version, a blank image, and a mismatched image. The relationships among the resulting answers are then transformed into weak supervision signals to train a counterfactual risk estimator. Subsequently, this trained estimator reranks candidate rewrites of questions, prioritizing those that preserve the original intent but are less invariant to the absence or mismatch of visual evidence, prior to final answer generation.
Experimental evaluations on the VQA-RAD dataset using Qwen2-VL-2B-Instruct demonstrate that while simple prompt-based rewriting actually increases counterfactual risk, employing predicted-risk reranking effectively lowers held-out risk from 0.658 to 0.623. Additionally, this approach boosts exact accuracy from 0.337 to 0.356. Further validation through a 300-sample external check on PMC-VQA confirms a consistent reduction in risk, accompanied by a modest improvement in accuracy. These findings indicate that strategic question selection serves as a valuable complement to existing methods for mitigating response-level hallucinations, thereby enhancing the reliability of medical VQA systems.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





