AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
Title: AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
Abstract:
The ability to reconstruct dynamic hand-object interactions from single-view video is essential for gathering data on dexterous manipulation and for developing lifelike digital twins for virtual reality and robotics. Nevertheless, existing techniques are hindered by two significant obstacles: first, the heavy reliance on neural rendering often results in disjointed geometries that are unsuitable for simulation, particularly when occlusions are severe; second, the dependence on fragile Structure-from-Motion (SfM) initialization causes frequent breakdowns when processing uncontrolled, real-world footage.
To address these challenges, we present AGILE, a resilient framework that transitions the field from traditional reconstruction to agentic generation for interaction learning. Our approach begins with an agentic pipeline in which a Vision-Language Model (VLM) directs a generative model to create a complete, watertight object mesh featuring high-fidelity textures, a process that remains unaffected by video occlusions. Second, we eliminate the need for unstable SfM by introducing a sturdy anchor-and-track strategy. The object’s pose is initialized at the onset of interaction in a single frame using a foundation model, and then propagated over time by capitalizing on the strong visual correlation between our generated asset and the video observations. Finally, contact-aware optimization applies semantic, geometric, and interaction stability constraints to ensure physical plausibility.
Comprehensive evaluations on the HO3D, DexYCB, and ARCTIC datasets, as well as on in-the-wild videos, demonstrate that AGILE surpasses baseline methods in global geometric accuracy. It also exhibits superior robustness in difficult sequences where previous methods typically fail. By emphasizing physical validity, our technique yields simulation-ready assets, which have been validated through real-to-sim retargeting for robotic tasks.
Project page: https://agile-hoi.github.io
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





