ZeroWBC: Learning Natural Whole-Body Humanoid Interaction from Human Egocentric Data
Title: ZeroWBC: Learning Natural Whole-Body Humanoid Interaction from Human Egocentric Data
Abstract:
Controlling humanoid robots for versatile and natural whole-body interactions remains a significant challenge, largely due to the prohibitive costs associated with collecting whole-body teleoperation data. To address this, we introduce ZeroWBC, a novel framework that eliminates the need for teleoperation by learning interaction capabilities directly from human egocentric videos. These videos are paired with synchronized whole-body motion capture and text annotations.
ZeroWBC utilizes a "generation-then-tracking" approach to manage whole-body interactions within static scenes. The process begins with an initial egocentric image and a language instruction. A fine-tuned Vision-Language Model then predicts future human whole-body motion tokens. These tokens are decoded into continuous motion sequences and subsequently retargeted to the humanoid robot. The system then employs a general interactive motion tracking policy to execute these reference motions, alongside specific root and key body-part trajectories.
To enhance interaction quality, we propose an interaction-oriented tracking reward. This reward mechanism emphasizes the alignment of global root and key body-part trajectories while ensuring the preservation of natural whole-body dynamics. Evaluations conducted on the Unitree G1 humanoid robot demonstrate that ZeroWBC facilitates diverse, scene-aware behaviors without requiring any robot teleoperation demonstrations. These findings indicate a scalable new paradigm for teaching natural whole-body interaction to humanoids using human egocentric data.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC



