arXiv

Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA

Title: Ask4VG: Mitigating Prior-Driven Responses in Medical VQA Through Risk-Aware Question Selection

Abstract: In medical visual question answering (VQA), it is critical for models to base their responses on concrete visual evidence, as answers lacking such support can lead to misleading interpretations. However, the prevalence of generic, template-based, or structurally similar questions often encourages models to rely on question-answer shortcuts rather than engaging in image-dependent reasoning. This tendency heightens the likelihood of hallucinated outputs. To address this, we introduce Ask4VG, a label-free pilot framework designed for risk-aware question selection. Ask4VG quantifies the hallucination risk associated with specific questions via counterfactual visual probing. This process involves querying the model with the same question against four distinct image conditions: the original image, a perturbed version, a blank image, and a mismatched image. The relationships among the resulting answers are then transformed into weak supervision signals to train a counterfactual risk estimator. Subsequently, this trained estimator reranks candidate rewrites of questions, prioritizing those that preserve the original intent but are less invariant to the absence or mismatch of visual evidence, prior to final answer generation.

Experimental evaluations on the VQA-RAD dataset using Qwen2-VL-2B-Instruct demonstrate that while simple prompt-based rewriting actually increases counterfactual risk, employing predicted-risk reranking effectively lowers held-out risk from 0.658 to 0.623. Additionally, this approach boosts exact accuracy from 0.337 to 0.356. Further validation through a 300-sample external check on PMC-VQA confirms a consistent reduction in risk, accompanied by a modest improvement in accuracy. These findings indicate that strategic question selection serves as a valuable complement to existing methods for mitigating response-level hallucinations, thereby enhancing the reliability of medical VQA systems.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...