MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video
Title: MAEPose: Enabling Self-Supervised Spatiotemporal Learning for Human Pose Estimation via mmWave Video
Abstract
Millimeter-wave (mmWave) radar technology presents a compelling, privacy-conscious alternative to traditional RGB-based methods for human pose estimation. Despite this potential, current approaches predominantly depend on intermediate representations—such as spectrogram images or sparse point clouds—that are extracted beforehand. This preprocessing discards the rich spatiotemporal data inherent in raw radar video streams and introduces unnecessary system complexity. Furthermore, existing frameworks are largely constrained to end-to-end supervised learning, failing to utilize unlabelled raw video streams for acquiring generalized representations.
To address these limitations, we introduce MAEPose, a novel human pose estimation framework grounded in masked autoencoding that processes mmWave spectrogram videos directly. By learning motion-aware, generalized spatiotemporal representations from unlabelled radar video, MAEPose employs a heatmap decoder to generate multi-frame pose estimation predictions. Our evaluation, conducted across three datasets using leave-one-person-out cross-validation and rigorous statistical testing, demonstrates that MAEPose consistently surpasses state-of-the-art baselines, achieving performance improvements of up to 22.1% in Mean Per Joint Position Error (MPJPE) with statistical significance (p<0.05). The model also exhibits strong robustness, maintaining accuracy during zero-shot bystander interference with a mere 6.5% increase in error. Ablation studies highlight the critical contributions of both the pre-training phase and the heatmap decoder. Additionally, modality analysis reveals that utilizing Range-Doppler video as input yields superior pose estimation performance compared to Range-Azimuth data or their fusion, all while incurring lower computational costs.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






