Ask When It Pays: Cost-Aware Open-Ended Interaction for Instance Goal Navigation
Title: Ask When It Pays: Cost-Aware Open-Ended Interaction for Instance Goal Navigation
Abstract: Instance Goal Navigation (IGN) tasks an embodied agent with locating a specific object instance amidst distractors, guided only by an underspecified natural-language description. Because this ambiguity frequently cannot be resolved through perception and language alone, interacting with an oracle serves as a vital disambiguation tool. Previous interactive approaches permit oracle queries but fail to distinguish between lightweight clarifications and route-level guidance. Consequently, these agents often improve their success rates by posing repeated, high-information questions rather than addressing the root ambiguity efficiently.
We reframe interactive IGN as a cost-sensitive problem focused on reducing uncertainty. In this framework, the agent is tasked with selecting questions that maximize the reduction in navigation uncertainty relative to the associated penalty. To achieve this, we perform an information-gain analysis on existing navigation datasets to pinpoint cues that effectively lower uncertainty, resulting in a concise set of question types and data-driven weights.
Current interactive navigation benchmarks lack models for the varying costs of different question types and do not assess the efficiency of agent interactions, rendering them inadequate for studying cost-sensitive behavior. Leveraging our taxonomy, we introduce a new benchmark designed to evaluate interaction efficiency and diagnose agent behavior. This benchmark incorporates a Weighted Success Rate metric that deducts points based on the derived cost of each query. Additionally, we propose a zero-shot MLLM navigator that selectively engages in queries at each decision step, but only when the anticipated reduction in uncertainty outweighs the interaction cost.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





