Fast-SAM3D: 3Dfy Anything in Images but Faster
Title: Fast-SAM3D: Accelerating 3Dfy Anything for Images
Abstract: While SAM3D facilitates scalable, open-world 3D reconstruction from intricate scenes, its practical application is currently limited by excessive inference latency. This study presents the first comprehensive analysis of SAM3D’s inference behavior, uncovering that standard acceleration techniques are ineffective in this specific context. We identify that these limitations arise from overlooking the pipeline’s inherent multi-level heterogeneity, which includes the distinct kinematic differences between shape and layout, the natural sparsity involved in texture refinement, and the spectral variance present across various geometries. To overcome these challenges, we introduce Fast-SAM3D, a training-free framework that dynamically synchronizes computational resources with the immediate complexity of the generation process. Our method incorporates three mechanisms designed to account for this heterogeneity: (1) Modality-Aware Step Caching, which separates structural evolution from sensitive layout adjustments; (2) Joint Spatiotemporal Token Carving, which directs refinement efforts toward regions with high entropy; and (3) Spectral-Aware Token Aggregation, which adjusts the resolution of the decoding phase. Comprehensive experiments show that Fast-SAM3D achieves an end-to-end speed increase of up to 2.67$\times$ with minimal impact on quality, setting a new Pareto optimal point for efficient single-view 3D generation. The source code is available at https://github.com/wlfeng0509/Fast-SAM3D.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





