arXiv

Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction

Title: Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction

Abstract:

Bridging the gap between low-level geometric sensing and high-level semantic understanding, 4D reconstruction of dynamic scenes sits at the core of advancements in computer vision and robotic perception. In this work, we introduce Genie 4D, a novel framework designed to transform standard smartphone footage into a semantically anchored, action-responsive 4D world model. The architecture integrates a real-time visual-inertial Gaussian splatting front-end, which handles metric geometry, with a feed-forward 4D backbone. This backbone is stabilized by frozen DINOv3 features, which serve as structural priors. These semantic constraints effectively mitigate identity drift during dynamic tracking. Furthermore, to address the loss of fine details often caused by regression backends, a conditional diffusion refiner is employed to restore high-frequency surface textures.

The system concludes with a lightweight latent-action head that interfaces the reconstructed 4D state with a Genie-style world model. This model is trained using a JEPA-style next-embedding objective, enabling the scene to be projected forward in time based on user inputs. Evaluated on the Point Odyssey and TUM-Dynamics benchmarks, Genie 4D maintains the linear time complexity, O(T), characteristic of feed-forward baselines, while significantly enhancing both 3D tracking accuracy (APD) and reconstruction completeness. The framework supports interactive operation on a single consumer-grade GPU (RTX 5090) and is compatible with capture clients across iPhone, Mac, Windows, and Linux platforms. Ultimately, Genie 4D provides a viable, semantically guided route toward developing physically grounded world models.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...