arXiv

CDPM-Align: Multi-Scale Guidance-Aligned Diffusion Pretraining for Robust Few-Shot Anatomical Landmark Detection

June 4, 2026 · Roberto Di Via, Irina Voiculescu, Francesca Odone, Vito Paolo Pastore · Original Source

Title: CDPM-Align: Multi-Scale Guidance-Aligned Diffusion Pretraining for Robust Few-Shot Anatomical Landmark Detection

Anatomical landmark detection serves as a cornerstone of medical image analysis, underpinning numerous diagnostic and interventional procedures. While contemporary approaches have demonstrated the ability to achieve sub-millimeter precision, high accuracy alone does not guarantee clinical viability; models must also exhibit reliability and robustness in their predictions. Nevertheless, the role of representation learning within this specific clinical context remains largely unexamined. To address this gap, we present CDPM-align, a novel framework utilizing multi-scale guidance-aligned conditional diffusion pre-training for anatomical landmark detection.

Our methodology emphasizes scenarios involving limited data, specifically focusing on few-image and few-annotation regimes. We leverage representation learning through conditional generative pre-training across three widely used, heterogeneous small-scale benchmark datasets. Additionally, we evaluate the model’s performance in low-annotation settings for the downstream task of landmark detection, utilizing datasets with only 10 and 25 annotated images. This approach mirrors the practical balance required between clinical workload and the resources available for annotation.

Our findings indicate that generative pre-training facilitates the acquisition of robust feature representations. Consequently, this leads to enhanced accuracy and better-calibrated uncertainty estimates in downstream tasks, thereby moving the field closer to the safe and efficient deployment of these technologies in clinical practice.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC