SAVMap: Structure-Aided Visual Mapping of Large-Scale 2.5D Manhattan Wireframes from Panoramic Video
Title: SAVMap: Leveraging Structure for Visual Mapping of Large-Scale 2.5D Manhattan Wireframes from Panoramic Video
Abstract: Accurate three-dimensional modeling of industrial settings is essential for applications like robot localization and the creation of digital twins. This paper introduces SAVMap, a novel approach that constructs semantic wireframe maps of warehouse shelving and lighting infrastructure using solely a panoramic video camera as the input sensor. The method processes sequences of rectified images, capturing both shelf and ceiling perspectives, which are extracted from panoramic footage recorded along warehouse aisles. A semantic segmentation network serves as the initial stage, identifying and tracking sparse semantic structure feature points—such as shelf corners and light centers—across the image sequences. By incorporating real-world geometric constraints, specifically Manhattan grids, a constrained structure-from-motion algorithm computes the 3D coordinates necessary to generate a wireframe map. We validate the scalability and precision of this approach in a facility comprising 46 shelving rows, with each face measuring 55 meters by 7 meters. Utilizing one hour of panoramic video, the system generates wireframe maps for more than 5,000 shelf elements, achieving a mean absolute error of 4.8 cm against ground-truth measurements.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





