J-RAS: Mutual Adaptation for Medical Image Segmentation via Contrastive Retrieval-Augmented Joint Optimization
Title: J-RAS: Mutual Adaptation for Medical Image Segmentation via Contrastive Retrieval-Augmented Joint Optimization
Abstract:
While manual segmentation performed by clinicians yields high accuracy, the process is labor-intensive and prone to inter-expert variability. Conversely, AI-driven approaches streamline this workflow but frequently struggle with performance degradation when facing data scarcity or domain shifts. Drawing inspiration from pathology trainees who master disease identification by comparing new cases against expert-labeled slides and histopathology atlases, we introduce Joint Retrieval-Augmented Segmentation (J-RAS). This framework empowers segmentation networks to learn with explicit guidance.
J-RAS operates by simultaneously optimizing both the segmentation and retrieval models through an alternating process of supervised and contrastive learning. This approach allows the retrieval network to identify image-mask pairs that are contextually pertinent, thereby enhancing the segmentation model’s ability to reason anatomically. In contrast to traditional retrieval-based augmentation methods that simply supply similar samples, J-RAS creates a feedback loop of mutual adaptation. Within this loop, the retrieval model learns to prioritize cues critical for segmentation, while the segmentation model utilizes these retrieved examples to sharpen boundary definition, increase resilience to rare pathologies, and boost generalization across different datasets.
We validated J-RAS on four public benchmarks covering diverse imaging modalities, such as ACDC and M&Ms (MRI), Breast Cancer Ultrasound, and CT scans for lung conditions and infections. The framework was tested across various backbone architectures, including U-Net, TransUNet, SegFormer, and SAM, demonstrating its broad applicability and efficacy. Notably, on the ACDC dataset, SegFormer’s performance saw significant gains, with the mean Dice coefficient rising from 0.8708$\pm$0.042 to 0.9115$\pm$0.031, and the Hausdorff Distance (HD) decreasing from 1.8130$\pm$2.49 to 1.1489$\pm$0.30. These findings underscore how retrieval-guided contrastive optimization successfully merges human-like instructional guidance with machine-learned precision in the field of medical image segmentation.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






