arXiv

HOIST: Humanoid Optimization with Imitation and Sample-efficient Tuning for Manipulating Suspended Loads

June 2, 2026 · Songyang Liu, Shunyu Yao, Dingyuan Huang, Shuai Li · Original Source

Title: HOIST: Humanoid Optimization with Imitation and Sample-efficient Tuning for Manipulating Suspended Loads

Abstract: Guiding humanoid robots to handle suspended payloads presents a significant difficulty, as the system is underactuated and oscillatory, allowing the robot to only affect the load through intermittent contact and whole-body movements. While imitation learning offers a secure starting point for behavior, it fails to directly optimize the final positioning. Conversely, training reinforcement learning models from scratch proves to be both unsafe and inefficient in terms of sample usage for real-world humanoids. To address these issues, we introduce HOIST (Humanoid Optimized with Imitation and Sample-efficient Tuning), a method designed specifically for manipulating suspended loads. The approach begins by fine-tuning a high-level vision-language-action (VLA) policy using demonstrations gathered via virtual-reality (VR) teleoperation, with commands executed through a whole-body controller. Subsequently, it employs iterative batched reinforcement learning alongside VLA rollouts to enhance both stopping behavior and placement precision. Our experiments, conducted in both simulation and on physical humanoid platforms, indicate that HOIST outperforms baselines relying solely on imitation or additional demonstrations. When compared to pure VLA rollouts, HOIST decreases translational placement error by 19.9 cm and raw angular error by 3.56 degrees. These results highlight the viability of humanoid robots for underactuated material-handling tasks.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC