arXiv

Belief Consistency Between Foundation-Model Evidence and Geometric Perception in Persistent Robotic Maps

Title: Harmonizing Foundation-Model Evidence with Geometric Perception in Durable Robotic Mapping

Abstract:

Autonomous robots relying on persistent mapping are increasingly combining two distinct information streams: a geometric perception layer with well-defined assertions and a foundation-model layer that generates semantic interpretations lacking calibrated reliability regarding the same environment. Current mapping architectures typically merge these channels by treating the foundation model as an additional voter within a per-element posterior distribution. However, these systems often fail to account for the model’s per-class reliability and lack mechanisms to detect or handle contradictions between the two channels in real time.

To address these limitations, we introduce a novel update operator featuring two synergistic components: a per-class calibrated commitment gate and a per-event conflict-drop window. The latter prevents the system from accepting foundation-model assertions that are immediately contradicted by the geometric channel. We validated our approach using the KITTI-360 and ScanNet datasets, employing both an oracle geometric channel (based on panoptic ground truth) and a standard online semantic segmenter (Mask2Former) to assess real-world efficacy.

Our results demonstrate that this operator yields significantly more accurate committed maps. In the KITTI dataset, the car commitment precision reached 99.7% with our method, compared to just 43.9% for an operator relying solely on calibration. Additionally, the mean per-class Intersection over Union (IoU) improved from 0.180 to 0.522. Furthermore, the framework preserves a higher number of compositional true positives at increased precision levels than monolithic compositional Vision-Language Model (VLM) prompts. The solution maintains deployment-grade quality across both oracle and off-the-shelf geometric channels and remains robust regardless of the specific foundation model used.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...