Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation
Title: Enhancing Reliability in CT Segmentation: A Pre-Deployment Stress Test via Clinically Driven Multi-Corruption Augmentation
Abstract
While deep learning models for CT segmentation frequently demonstrate high precision on pristine benchmark datasets, their efficacy often diminishes when confronted with the diverse and noisy conditions typical of clinical practice. Factors such as image noise, reduced resolution, contrast fluctuations, intensity shifts, and various artifacts can significantly impair model stability, thereby hindering their safe integration into real-world medical imaging workflows. To address this vulnerability, we introduce the Robustness via Augmented Multi-corruption Pipeline (RAMP), a specialized augmentation framework designed to enhance robustness in CT segmentation tasks. RAMP integrates stochastic multi-corruption composition with CT intensity transformations and anatomically constrained spatial perturbations, ensuring that models are trained on images exhibiting clinically realistic degradation.
Evaluations across two distinct CT segmentation benchmarks reveal that RAMP delivers superior performance on corrupted images and narrows the disparity between clean and corrupted results more effectively than existing methods. In the five-organ noisy evaluation benchmark, RAMP increased the mean corrupted Dice score from 0.610 to 0.753 and shrank the robustness gap from 0.264 to 0.064, outperforming the nnU-Net baseline. Similarly, on the Abdomen1K dataset, the framework raised the mean corrupted Dice from 0.633 to 0.789 and reduced the robustness gap from 0.290 to 0.070. Although RAMP did not secure the top clean-image Dice scores, it played a critical role in preventing catastrophic segmentation failures during severe image degradation. These findings indicate that employing multi-corruption augmentation is a viable and effective pre-deployment strategy for bolstering the dependability of CT segmentation systems in complex, heterogeneous clinical settings.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




