SAM 3D: 3Dfy Anything in Images
Title: SAM 3D: Transforming Images into 3D Models with Ease
Abstract:
This paper introduces SAM 3D, a generative AI model designed for visually grounded 3D object reconstruction. The system predicts geometry, texture, and spatial layout directly from a single input image. SAM 3D demonstrates exceptional performance in natural images, effectively handling challenges such as occlusion and scene clutter by leveraging contextual visual recognition cues.
To support this capability, we developed a hybrid annotation pipeline involving both humans and models. This approach allowed us to generate 3D reconstruction data—covering object shape, texture, and pose—at a scale never before seen. We trained the model using a contemporary, multi-stage framework that integrates synthetic pretraining with real-world alignment, thereby overcoming the traditional "data barrier" in 3D modeling.
Our results show substantial improvements over recent studies, achieving a win rate of at least 5:1 in human preference tests when evaluating reconstructions of real-world objects and scenes. In addition to the model, we will make available the source code, model weights, an online demonstration platform, and a new, rigorous benchmark designed for in-the-wild 3D object reconstruction.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




