arXiv

PatchScene: Patch-based Voxel Diffusion for Large-Scale Scene Completion

June 3, 2026 · Qingdong Xu, Jiajun Zhu, Shilin Zhu, Xinjing He, Chao Lu, Huanran Wang, Jiyao Zhang · Original Source

Title: PatchScene: Leveraging Patch-Based Voxel Diffusion for Comprehensive Large-Scale Scene Reconstruction

Abstract:

This paper introduces PatchScene, an innovative diffusion-driven framework designed to address the challenge of large-scale LiDAR scene completion. Diverging from conventional approaches that depend on dense voxel grids or global latent representations, PatchScene employs a patch-based voxel diffusion strategy. This method explicitly constructs fine-grained geometric details within specific, localized 3D areas. To guarantee reconstruction coherence across both spatial and temporal dimensions, we have developed a confidence-guided spatio-temporal fusion mechanism. This mechanism seamlessly combines adjacent frames and overlapping patches within a single, unified generative workflow. Additionally, we propose an Annular-Flow diffusion strategy that capitalizes on the radial density characteristics inherent in LiDAR scans. This approach facilitates the progressive transmission of high-fidelity data from near-range to far-range zones, thereby enabling scene completion without spatial boundaries.

Comprehensive evaluations conducted on the SemanticKITTI benchmark reveal that PatchScene delivers state-of-the-art results across all standard performance metrics. It outperforms prior methods in terms of both temporal consistency and geometric precision. Notably, the model exhibits exceptional scalability and generalization potential for real-world autonomous driving scenarios; specifically, a model trained exclusively on LiDAR data with a 20-meter range demonstrates effective generalization to 50-meter scenes without the need for retraining.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC