Uncertainty-Calibrated Explainable Artificial Intelligence for Fetal Ultrasound Plane Classification: A Systematic Review
Title: Uncertainty-Calibrated Explainable Artificial Intelligence for Fetal Ultrasound Plane Classification: A Systematic Review
Abstract
Fetal ultrasound serves as the foundation of antenatal care, with the precise identification of a limited number of standard anatomical planes being essential for biometry, growth monitoring, and the detection of structural anomalies. While deep learning classifiers currently achieve accuracy levels comparable to or surpassing those of human experts on curated benchmarks, many of these models lack transparency and suffer from miscalibration. Consequently, clinicians are often deprived of the calibrated confidence scores and reliable explanations required for safe decision support.
In adherence to PRISMA 2020 guidelines, we conducted a systematic review of 78 studies published between January 1, 2015, and April 30, 2026. These studies focused on automated fetal plane classification integrated with either explainability techniques or predictive uncertainty quantification. The pooled balanced accuracy across six standard planes was found to be 0.93 (95% CI 0.91 to 0.95). However, the integration of reliability metrics remains limited: only 19 studies (24%) reported calibration measures, and just 14 (18%) addressed selective prediction.
To address these gaps, we introduce CALIB-XFUS, a 22-item reporting framework designed to operationalize calibration, explanation faithfulness, and fairness for regulated fetal ultrasound artificial intelligence. This framework covers six key domains: clinical task and indication for use; dataset provenance and representativeness; model and training pipeline; calibration and selective prediction; explanation faithfulness and clinician validation; and post-market surveillance. We contend that achieving uncertainty calibration, faithful explanations, and fairness auditing in fetal ultrasound AI is not only technically viable but also a regulatory expectation under the FDA’s Good Machine Learning Practice principles and the high-risk obligations outlined in the EU AI Act.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





