arXiv

Efficient Transformer-Based Localized Patch Sampling for Choroid Plexus Segmentation in Multiple Sclerosis

June 3, 2026 · Po-Jui Lu, Alessandro Cagol, Mario Ocampo-Pineda, Federico Spagnolo, Marina Mastantuono, Andreea-Alexandra Aldea, Jannis M\"uller, \"Ozg\"ur Yaldizli, Matthias Weigel, Lester Melie-Garcia, Roberta Magliozzi, Maria Pia Sormani, Ludwig Kappos, Jens Kuhle, C · Original Source

Title: Leveraging Efficient Transformer-Based Localized Patch Sampling for Choroid Plexus Segmentation in Multiple Sclerosis

Abstract

Background: The lateral ventricle choroid plexus (LVCP) is increasingly acknowledged as a critical imaging biomarker for multiple sclerosis (MS), particularly concerning its association with physical disability and neuroinflammation. However, the manual segmentation of the LVCP is an excessively labor-intensive process, which hinders its adoption in large-scale clinical trials and longitudinal studies. To address this challenge, this study introduces a pipeline driven by SwinUNETR that utilizes targeted sampling of small intra- and peri-ventricular patches to automatically segment the LVCP in MS patients using both single-modality and multi-modal MRI data.

Methods: We conducted a retrospective analysis of 3T MRI scans drawn from three distinct datasets, comprising two separate cohorts primarily focused on MS (Dataset 1: n=177; Dataset 2: n=177) and an expanded test set (n=388). The proposed approach utilized a SwinUNETR architecture trained on 32x32x32 voxel patches. This model was benchmarked against the 3D UXNET model. Evaluation relied primarily on the Dice Similarity Coefficient (DSC), with additional metrics including computational complexity (measured in GFLOPs) and the 95th percentile Hausdorff Distance (HD95).

Results: In the extended test set, the SwinUNETR model achieved a mean DSC of 0.868 (95% CI: 0.863-0.872) when combining MPRAGE and FLAIR sequences. This performance represented a statistically significant improvement over UXNET, which yielded a DSC of 0.858 (95% CI: 0.853-0.862; p<0.0001). Under conditions where only standalone FLAIR inputs were available, the transformer-based method maintained a robust DSC of 0.863. In contrast, UXNET exhibited a notable decline in spatial localization accuracy, with its HD95 increasing from 1.86 mm to 3.00 mm. Furthermore, the proposed framework reduced computational requirements by 99%, dropping from 22,080 GFLOPs for the comparator to just 91.8 GFLOPs.

Conclusion: By combining localized patch sampling with a SwinUNETR architecture, this methodology provides an accurate, robust, and statistically superior alternative to existing state-of-the-art models for LVCP segmentation. The substantial decrease in computational cost positions this approach as highly suitable for broad implementation in both research and clinical settings.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC