arXiv

Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

Title: Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

Abstract: The robustness of Medical Vision-Language Models (VLMs) in non-English clinical environments remains largely unexamined, as these systems are predominantly assessed using English-based radiology visual question answering (VQA) benchmarks. To address this gap, we present IndoRad-VQA, an Indonesian variant of the VQA-RAD dataset, designed to test whether medical VLMs maintain their radiological reasoning capabilities when questions are posed in Bahasa Indonesia. To ensure the preservation of clinical meaning, terminology consistency, and answer equivalence, radiology question-answer pairs were translated into Indonesian, with self-evaluation employed as a quality control mechanism. Our study evaluates a range of models—including general-purpose, Southeast Asian multilingual, and medical-specific VLMs—under both English and Indonesian prompting conditions. In addition to measuring accuracy, we quantify the "language robustness gap" between the two languages and perform an error analysis to pinpoint specific failure modes, such as yes/no flips, laterality errors, and mismatches in output language. Our results indicate that high performance on English medical VQA benchmarks does not guarantee reliable behavior in Indonesian clinical settings. Depending on the evaluation metric, we observed a performance disparity ranging from 8 to 25 percent between English and Indonesian inputs. These findings underscore the necessity for more inclusive, multilingual evaluations of medical multimodal foundation models. The dataset is accessible at https://huggingface.co/datasets/Lab-IS/IndoRad-VQA.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...