arXiv

AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

Title: AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

Abstract:

The ability to reconstruct dynamic hand-object interactions from single-view video is essential for gathering data on dexterous manipulation and for developing lifelike digital twins for virtual reality and robotics. Nevertheless, existing techniques are hindered by two significant obstacles: first, the heavy reliance on neural rendering often results in disjointed geometries that are unsuitable for simulation, particularly when occlusions are severe; second, the dependence on fragile Structure-from-Motion (SfM) initialization causes frequent breakdowns when processing uncontrolled, real-world footage.

To address these challenges, we present AGILE, a resilient framework that transitions the field from traditional reconstruction to agentic generation for interaction learning. Our approach begins with an agentic pipeline in which a Vision-Language Model (VLM) directs a generative model to create a complete, watertight object mesh featuring high-fidelity textures, a process that remains unaffected by video occlusions. Second, we eliminate the need for unstable SfM by introducing a sturdy anchor-and-track strategy. The object’s pose is initialized at the onset of interaction in a single frame using a foundation model, and then propagated over time by capitalizing on the strong visual correlation between our generated asset and the video observations. Finally, contact-aware optimization applies semantic, geometric, and interaction stability constraints to ensure physical plausibility.

Comprehensive evaluations on the HO3D, DexYCB, and ARCTIC datasets, as well as on in-the-wild videos, demonstrate that AGILE surpasses baseline methods in global geometric accuracy. It also exhibits superior robustness in difficult sequences where previous methods typically fail. By emphasizing physical validity, our technique yields simulation-ready assets, which have been validated through real-to-sim retargeting for robotic tasks.

Project page: https://agile-hoi.github.io


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...