Belief Consistency Between Foundation-Model Evidence and Geometric Perception in Persistent Robotic Maps
Title: Harmonizing Foundation-Model Evidence with Geometric Perception in Durable Robotic Mapping
Abstract:
Autonomous robots relying on persistent mapping are increasingly combining two distinct information streams: a geometric perception layer with well-defined assertions and a foundation-model layer that generates semantic interpretations lacking calibrated reliability regarding the same environment. Current mapping architectures typically merge these channels by treating the foundation model as an additional voter within a per-element posterior distribution. However, these systems often fail to account for the model’s per-class reliability and lack mechanisms to detect or handle contradictions between the two channels in real time.
To address these limitations, we introduce a novel update operator featuring two synergistic components: a per-class calibrated commitment gate and a per-event conflict-drop window. The latter prevents the system from accepting foundation-model assertions that are immediately contradicted by the geometric channel. We validated our approach using the KITTI-360 and ScanNet datasets, employing both an oracle geometric channel (based on panoptic ground truth) and a standard online semantic segmenter (Mask2Former) to assess real-world efficacy.
Our results demonstrate that this operator yields significantly more accurate committed maps. In the KITTI dataset, the car commitment precision reached 99.7% with our method, compared to just 43.9% for an operator relying solely on calibration. Additionally, the mean per-class Intersection over Union (IoU) improved from 0.180 to 0.522. Furthermore, the framework preserves a higher number of compositional true positives at increased precision levels than monolithic compositional Vision-Language Model (VLM) prompts. The solution maintains deployment-grade quality across both oracle and off-the-shelf geometric channels and remains robust regardless of the specific foundation model used.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





