Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking
Title: Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking
Abstract: We present Humanoid-GPT, a causal attention-based GPT-style Transformer designed for whole-body control, trained on a corpus encompassing a billion-scale motion dataset. In contrast to previous shallow MLP trackers, which were limited by data scarcity and the trade-off between agility and generalization, Humanoid-GPT is pre-trained on a retargeted corpus of 2 billion frames. This dataset harmonizes major existing mocap datasets with extensive in-house recordings. By scaling both model capacity and data volume, we develop a single generative Transformer capable of tracking highly dynamic behaviors and achieving unprecedented zero-shot generalization to unseen motions and control tasks. Our extensive experiments and scaling analyses demonstrate that this approach establishes a new performance benchmark, showcasing robust zero-shot generalization to novel tasks alongside the precise tracking of complex and highly dynamic movements.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



