Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation
Title: Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation
Abstract:
Multi-modal 3D Intelligence has attracted significant interest, particularly for its broad applicability in domains such as world simulation and autonomous driving. By incorporating an extra modality beyond traditional single-modal 3D perception, these systems not only enhance the accuracy and depth of scene understanding but also establish a robust basis for complex interactions with the physical world. This capability is vital in diverse and difficult settings where 3D data alone proves insufficient. Despite a notable increase in multi-modal 3D methodologies over the last six yearsâparticularly those combining multi-camera imagery (3D+2D) and text (3D+language)âthere remains a lack of thorough, holistic reviews in this area. To address this void, this study offers a systematic overview of recent developments. We start by outlining the distinct challenges associated with various 3D multi-modal tasks. Subsequently, we introduce a new classification framework that organizes current methods based on their modalities and specific tasks, while examining their respective advantages and drawbacks. The paper also provides a comparative analysis of recent techniques across multiple benchmark datasets, accompanied by detailed insights. Finally, we highlight existing open problems and suggest promising directions for future inquiry.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




