From None to All: Self-Supervised 3D Reconstruction via Novel View Synthesis
Title: Achieving Comprehensive 3D Reconstruction from Sparse Inputs via Self-Supervised Novel View Synthesis
Abstract:
This study presents NAS3R, a novel self-supervised, feed-forward architecture designed to simultaneously infer explicit 3D geometry and camera parameters without relying on ground-truth labels or pretrained priors. The training process involves reconstructing 3D Gaussians from context views that lack calibration or pose information, subsequently rendering target viewpoints using the camera parameters predicted by the model itself. This approach allows for self-supervised optimization driven by 2D photometric loss. To guarantee stable convergence, the framework embeds both reconstruction and camera estimation within a unified transformer backbone, governed by masked attention mechanisms. Additionally, it employs a depth-based Gaussian formulation to ensure well-conditioned optimization. While fully functional in a self-supervised setting, NAS3R remains compatible with leading supervised 3D reconstruction models, allowing for the integration of pretrained priors or intrinsic camera data when such information is accessible. Comprehensive experimental results demonstrate that NAS3R outperforms existing self-supervised approaches, offering a scalable, geometry-aware solution for 3D reconstruction from unstructured data. The source code and trained models are publicly accessible at https://ranrhuang.github.io/nas3r/.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





