PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps
Title: PlatonicNav: Revealing Semantic Correspondence in Navigation via Platonic Topological Maps
Abstract: Embodied visual navigation, a capability where agents interpret intricate environments and execute actions to reach objectives using only raw sensory data, serves as the foundation for numerous applications, including household service robotics, assistive technologies, and large-scale autonomous exploration. Despite this importance, current efforts to integrate vision-and-language navigation (VLN) with object goal navigation (ObjNav) have largely focused on architectural integration, mixed-task training protocols, and extensive vision-language pretraining. These approaches have not sufficiently investigated whether independently trained vision and language encoders might inherently possess a shared semantic structure. Furthermore, while object-centric topological maps currently rely on explicit cross-modal supervision—such as CLIP or large vision-language models—to anchor language goals, it remains unclear whether such grounding can be achieved using a map constructed solely from visual data.
To tackle these issues, we expand the Platonic Representation Hypothesis to the domain of embodied navigation. We propose viewing vision-only ObjNav, cross-modal ObjNav, and VLN as three distinct interfaces accessing a single, object-centric semantic manifold. In this context, we present PlatonicNav, a novel framework that requires no training. PlatonicNav utilizes a Platonic Topological Map that combines geometric and semantic node distances derived from a self-supervised visual encoder. It grounds language goals through blind matching, eliminating the need for any paired vision-language training data. Our extensive evaluations on simulation benchmarks, including HM3D-IIN, OVON, and R2R-CE on MP3D, along with real-world deployment on the Unitree Go2 robot, show that PlatonicNav achieves robust generalization across various tasks, modalities, and robotic embodiments without explicit cross-modal training.
Code: https://github.com/AIGeeksGroup/PlatonicNav. Website: https://aigeeksgroup.github.io/PlatonicNav.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





