arXiv

EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation

Title: EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation

Abstract:

Effective long-horizon planning in zero-shot embodied navigation relies heavily on robust memory systems. Traditional approaches face significant limitations: detector-centric scene graphs tend to reduce observations to sparse nodes, which often results in the loss of detailed visual information and the accumulation of noise, whereas methods based on 3D reconstruction are typically too computationally expensive. To address these challenges, we introduce EvoMemNav, a novel framework designed for efficient, self-evolving, and fine-grained memory management in zero-shot embodied navigation.

At the core of EvoMemNav is the Visual-Semantic Memory Graph (VSMGraph). This structure treats raw visual views as primary memory elements, organizing them through lightweight semantic indicators and topological connections into a hierarchical structure comprising rooms, views, and objects. This approach ensures that fine-grained details are retained to support accurate disambiguation and stop verification. To manage the expansion of memory resources, we propose a budgeted coarse-to-fine policy. This strategy first compresses the search space during a coarse stage to identify promising regions, then engages a Vision-Language Model (VLM) only during a fine stage for specific verification and decision-making tasks.

Furthermore, EvoMemNav goes beyond static memory storage by implementing reflection-driven write-back mechanisms after each subtask. This process updates graph-attached priors that encapsulate accumulated environmental knowledge, allowing the system to refine future decisions without the need for retraining. Our experimental results on GOAT-Bench and HM3D datasets demonstrate consistent improvements in Success Rate (SR) and Success weighted by Path Length (SPL) across object, text-description, and image-goal modalities. These gains are accompanied by enhanced ability to disambiguate multiple instances, a reduction in premature stops, and superior zero-shot generalization capabilities.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...