arXiv

DVGT: Driving Visual Geometry Transformer

Title: DVGT: Driving Visual Geometry Transformer

Abstract: Accurately perceiving and reconstructing 3D scene geometry from visual data is a fundamental requirement for autonomous driving systems. Despite its importance, there remains a scarcity of dense geometry perception models specifically designed for driving environments that can effectively adapt to varying scenarios and diverse camera setups. To address this limitation, we introduce the Driving Visual Geometry Transformer (DVGT), a novel approach that reconstructs a global, dense 3D point map from a sequence of multi-view visual inputs without requiring known poses.

Our method begins by extracting visual features from each image using a DINO backbone. It then leverages a mechanism of alternating attention layers—specifically intra-view local attention, cross-view spatial attention, and cross-frame temporal attention—to deduce geometric relationships across the image sequence. Subsequently, multiple decoding heads are employed to generate a global point map within the ego coordinate system of the initial frame, while also estimating the ego poses for every subsequent frame.

Distinct from traditional approaches that depend on precise camera parameters, DVGT operates without explicit 3D geometric priors. This characteristic allows for the flexible processing of arbitrary camera configurations. Furthermore, DVGT directly predicts metric-scaled geometry from image sequences, thereby removing the necessity for post-alignment with external sensors.

Evaluated on a comprehensive mixture of driving datasets, including nuScenes, OpenScene, Waymo, KITTI, and DDAD, DVGT demonstrates significant performance improvements over existing models across various scenarios. The source code has been made publicly available at https://github.com/wzzheng/DVGT.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Bloomberg Tech Event Special | Bloomberg Tech 6/04/2026
Bloomberg

Bloomberg Tech Event Special | Bloomberg Tech 6/04/2026

This title indicates a special Bloomberg Tech broadcast scheduled for June 4, 2026. No specific content details are prov...

Anthropic’s Amodei on Pros and Cons of an AI Startup IPO
Bloomberg

Anthropic’s Amodei on Pros and Cons of an AI Startup IPO

Anthropic CEO Dario Amodei weighs the pros and cons of an IPO for his AI startup, highlighting the trade-offs between pu...

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

Fed's Daly Says Forward Guidance Could Be Misleading
Bloomberg

Fed's Daly Says Forward Guidance Could Be Misleading

Fed’s Daly warns forward guidance may be misleading or lack clarity.

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...