Motion-Guided Causal Disentanglement for Robust Multi-View Cine Cardiac MRI Diagnosis
Title: Motion-Guided Causal Disentanglement for Robust Multi-View Cine Cardiac MRI Diagnosis
Abstract:
Multi-view cardiac magnetic resonance (CMR) imaging is extensively utilized for noninvasive disease assessment, as it yields complementary anatomical insights. While recent transformer-based architectures have shown potent representation learning abilities for CMR analysis, they generally rely on unified latent embeddings. This approach tends to entangle view-specific anatomical variations with features related to pathology, thereby biasing classifiers toward structural attributes instead of view-invariant pathological patterns. This limitation is particularly pronounced in low-data scenarios, such as those involving underrepresented cardiac conditions, where sample scarcity heightens the risk of shortcut learning and the formation of view-dependent decision boundaries.
To overcome these challenges, we introduce MoViD (Motion-Guided View–Disease Disentanglement), a framework built on a ViT-MAE backbone. MoViD explicitly separates latent representations into disease-discriminative components and view-specific elements. This separation is achieved through dual-branch supervised contrastive objectives and a gradient-reversal adversarial constraint designed to minimize disease information leakage into the view embedding. Furthermore, the model incorporates an annotation-free temporal motion feature, extracted from inter-frame difference maps, to effectively localize the beating heart and reduce background noise. To address class imbalance, a focal reweighting mechanism is integrated into the contrastive loss.
We assessed the framework’s performance on two public benchmarks (M&Ms and M&Ms2) alongside a private clinical dataset focused on venous thrombosis. The results indicate that our method consistently surpasses standard transformer baselines in both cardiac segmentation and disease classification tasks. Moreover, it achieves performance comparable to large-scale pretrained foundation models, thereby confirming the effectiveness of structural disentanglement in medical image analysis.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






