UnsOcc: 3D Semantic Occupancy Prediction in Unstructured Scene via Rendering Fusion
Title: UnsOcc: Achieving 3D Semantic Occupancy Prediction in Unstructured Environments Through Rendering Fusion
Abstract
Autonomous driving systems face distinct difficulties in unstructured environments, where irregular obstacles and sparse layouts render conventional perception techniques, such as 3D object detection, less effective. In response, 3D semantic occupancy prediction has gained significant attention for its capacity to generate dense spatial representations by labeling individual 3D voxels with semantic information. Nevertheless, applying this method directly to unstructured settings is difficult; scene sparsity impedes efficient cross-modal fusion, while the pronounced long-tail distribution in these contexts further compromises prediction accuracy.
To demonstrate the efficacy of our proposed solution, we have assembled a specialized dataset comprising unstructured scenes gathered from open-pit mines. Leveraging this data, we introduce UnsOcc, a multi-modal framework designed for 3D semantic occupancy prediction that enhances robustness within unstructured environments. The core of our approach is the RenderFusion module, which utilizes bidirectional rendering supervision to strengthen cross-modal feature alignment. Additionally, we present GSRefinement, an auxiliary supervision technique that leverages Gaussian Splatting for detail awareness. This method projects sparse 3D occupancy predictions into dense 2D semantic segmentation maps, thereby facilitating effective supervision for long-tail categories. Comprehensive evaluations conducted on both the open-pit mine dataset and the nuScenes dataset reveal that our method substantially surpasses current state-of-the-art techniques.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





