DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos
Title: DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos
Abstract:
For applications such as embodied simulation, avatar animation, and the creation of digital humans, monocular video human mesh recovery (HMR) is a critical component, demanding both expressive whole-body movement and temporal stability. Current video-based HMR approaches typically generate smooth body motion but frequently neglect intricate hand articulation. Conversely, image-based whole-body methods reconstruct SMPL-X meshes on a per-frame basis, which often results in jittery and imprecise hand movements.
To address these limitations, we introduce a temporally coherent framework for whole-body HMR tailored for challenging, real-world monocular videos. By employing residual body-hand fusion, our model integrates body context with part-specific hand observations, facilitating the simultaneous achievement of stable body motion and detailed hand recovery within a unified temporal architecture. Additionally, we implement a close-up-aware augmentation strategy to enhance robustness when subjects are framed in the upper body. Evaluations across both whole-body and body-only benchmarks reveal that our approach yields superior hand reconstruction alongside competitive body accuracy. Furthermore, the method generates SMPL-X motion that is not only temporally stable but also 2D-consistent, even in difficult real-world scenarios.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






