4D Reconstruction from Sparse Dynamic Cameras
Title: 4D Reconstruction from Sparse Dynamic Cameras
Original: arXiv:2606.04593v1 Announce Type: new
Abstract:
While recent progress has been made in dynamic 3D (or 4D) reconstruction using monocular dynamic cameras, these approaches continue to struggle with inherent depth ambiguity. This study explores a more practical alternative: a sparse dynamic camera configuration. In this setup, a small number of independently moving cameras record the same subjects. This method maintains low capture expenses while providing multi-view constraints, making it suitable for real-world video production in contexts such as sports, concerts, and television broadcasts.
However, our experiments indicate that simply adapting existing monocular or dense-fixed camera techniques is inadequate. These naive extensions fail to address the complex spatiotemporal inconsistencies that arise across different views and over time. To bridge this gap, we introduce a straightforward yet effective 3D track initialization method. This approach ensures spatiotemporal consistency by combining inter-camera feature matching with intra-camera point tracking. We further enhance optimization stability and cross-view generalization by implementing a noise-robust depth-ordering regularization loss alongside a spatiotemporally diverse batch sampling strategy.
Moreover, to tackle the absence of standardized benchmarks for this specific task, we present LetCamsGo. This new real-world video dataset comprises five sequences filmed across four distinct environments, captured by three independently moving cameras and one stationary camera. Extensive benchmarking on LetCamsGo reveals that our proposed framework significantly improves the quality of 4D reconstruction in dynamic areas compared to baseline methods. These findings lay the groundwork for a cost-effective 4D reconstruction paradigm applicable in unstructured, real-world settings.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






