arXiv

PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps

Title: PlatonicNav: Revealing Semantic Correspondence in Navigation via Platonic Topological Maps

Abstract: Embodied visual navigation, a capability where agents interpret intricate environments and execute actions to reach objectives using only raw sensory data, serves as the foundation for numerous applications, including household service robotics, assistive technologies, and large-scale autonomous exploration. Despite this importance, current efforts to integrate vision-and-language navigation (VLN) with object goal navigation (ObjNav) have largely focused on architectural integration, mixed-task training protocols, and extensive vision-language pretraining. These approaches have not sufficiently investigated whether independently trained vision and language encoders might inherently possess a shared semantic structure. Furthermore, while object-centric topological maps currently rely on explicit cross-modal supervision—such as CLIP or large vision-language models—to anchor language goals, it remains unclear whether such grounding can be achieved using a map constructed solely from visual data.

To tackle these issues, we expand the Platonic Representation Hypothesis to the domain of embodied navigation. We propose viewing vision-only ObjNav, cross-modal ObjNav, and VLN as three distinct interfaces accessing a single, object-centric semantic manifold. In this context, we present PlatonicNav, a novel framework that requires no training. PlatonicNav utilizes a Platonic Topological Map that combines geometric and semantic node distances derived from a self-supervised visual encoder. It grounds language goals through blind matching, eliminating the need for any paired vision-language training data. Our extensive evaluations on simulation benchmarks, including HM3D-IIN, OVON, and R2R-CE on MP3D, along with real-world deployment on the Unitree Go2 robot, show that PlatonicNav achieves robust generalization across various tasks, modalities, and robotic embodiments without explicit cross-modal training.

Code: https://github.com/AIGeeksGroup/PlatonicNav. Website: https://aigeeksgroup.github.io/PlatonicNav.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...