Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation
Title: Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation
Abstract:
While the self-supervised pre-training paradigm has become increasingly vital for acquiring transferable representations in medical imaging, current approaches for ultrasound (US) images are largely limited to the image or frame level. These methods frequently neglect the anatomical context necessary for learning representations that align with clinical needs. To address this gap, we introduce ANAUS, a novel ultrasound self-supervision framework that redirects representation learning from generic visual areas to clinically significant anatomical structures.
Our approach leverages a learnable latent prompt engine and performs a single instance of domain adaptation using existing public image-mask pairs. This strategy enables the LP-SAM module to perform scalable, annotation-free anatomy delineation. Building on this anatomical foundation, we develop a dual-policy self-supervised learning paradigm designed to refine representation learning. This paradigm comprises two key components: contextual core-region prediction and inter-view semantics-aware anatomy-separating alignment. The alignment mechanism ensures feature invariance within the same anatomical regions while enhancing discriminability between different structures. Simultaneously, the prediction component requires the model to reconstruct corrupted areas, allowing it to capture intricate structural details.
Comprehensive evaluations across six public datasets reveal that our method consistently surpasses existing state-of-the-art techniques. Furthermore, it preserves the computational efficiency required for practical clinical deployment. The source code for this work is accessible at https://github.com/zhcz328/ANAUS.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



