arXiv

Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models

Title: Active Exploration Resembling Pigeon Behavior: Enhancing Spatial Reasoning through Agentic Vision-Language Models

Abstract:

Empowering Vision-Language Models (VLMs) to execute spatial reasoning presents significant hurdles. Current methodologies often relegate VLMs to the role of passive observers, a limitation that hinders their utility in practical, real-world scenarios. Furthermore, traditional reinforcement learning techniques depend on sparse reward structures, which restricts their efficacy when tackling intricate reasoning problems. Drawing inspiration from how pigeons construct and leverage cognitive maps for navigation, we introduce an innovative agentic framework designed for spatial reasoning.

Our approach begins with the development of a \emph{dynamic cognitive map}, which encodes scene layouts through the positions and orientations of objects. This component acts as a continuous memory system for integrating new visual inputs. Additionally, we present \emph{Spatial Assertion Codes (SAC)}, a set of Python expressions that algorithmically define spatial relationships. By working in tandem with the dynamic cognitive map, SAC facilitates the validation of intermediate reasoning steps, thereby generating dense reward signals to guide learning. The model undergoes optimization through a combination of supervised learning and reinforcement finetuning.

Evaluations on the MindCube benchmark reveal that our method achieves state-of-the-art results, attaining an overall accuracy of \emph{80.5\%}. Notably, on the difficult \textsc{Rotation} subset, it surpasses the leading existing approach by \emph{29.5} accuracy points, marking a relative improvement of \emph{53.2\%}. The associated code and datasets have been made publicly available at https://github.com/dw-dengwei/active-spatial-reasoning.git.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...