arXiv

A Cookbook of 3D Vision: Data, Learning Paradigms, and Application

June 4, 2026 · Hongyang Du, Zongxia Li, Dawei Liu, Runhao Li, Haoyuan Song, Qingyu Zhang, Yubo Wang, Jingcheng Ni, Shihang Gui, Congchao Dong, Tao Hu · Original Source

Title: A Comprehensive Guide to 3D Vision: Data, Learning Paradigms, and Applications

Abstract:

The domain of 3D vision has undergone rapid transformation, propelled by a growing variety of data representations, learning frameworks, and modeling techniques. Despite this progress, the field suffers from fragmentation across different representations and benchmarks, which hinders the establishment of a unified understanding regarding efficiency, fidelity, and scalability. To address this, our study introduces a data-centric taxonomy for 3D vision, creating a cohesive conceptual map that links geometric representations, datasets, learning frameworks, and practical applications.

We start by evaluating the primary structural formats of 3D data—including point clouds, meshes, voxels, and 3D Gaussians—alongside their respective acquisition methods. Subsequently, we investigate how dataset architecture, benchmark design, and supervision strategies have influenced recent developments, such as 2D-supervised 3D learning, implicit neural representations, and 4D world modeling. By adopting this integrative perspective, we elucidate the connections between representations, learning approaches, and downstream tasks like reconstruction, generation, and video modeling. This work delivers a consolidated overview of current trends, highlighting the ongoing efforts to balance efficiency with fidelity and to achieve multimodal geometric grounding.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC