arXiv

MindZero: Learning Online Mental Reasoning With Zero Annotations

June 2, 2026 · Shunchi Zhang, Jin Lu, Chuanyang Jin, Yichao Zhou, Zhining Zhang, Tianmin Shu · Original Source

Title: MindZero: Enabling Online Mental Reasoning Without Annotations

Abstract

To provide effective assistance in real-world scenarios, AI agents must possess a robust Theory of Mind (ToM)—the capacity to deduce human mental states from observed behavior. Although significant progress has been made, several critical hurdles persist. These include the difficulty of performing online inference with reliable uncertainty updates across multiple hypotheses, the necessity for reasoning processes that are efficient enough for real-time support, and the scarcity of ground-truth mental state data in practical applications.

We propose MindZero, a self-supervised reinforcement learning framework designed to equip multimodal large language models (MLLMs) with efficient and resilient online mental reasoning capabilities. During the training phase, the model receives rewards for producing mental state hypotheses that best explain observed actions, as calculated by a planner. This approach mirrors model-based ToM reasoning and removes the dependency on explicit mental state annotations. Once trained, MindZero condenses this model-based reasoning into a rapid, single-pass inference process.

We benchmarked MindZero against baseline methods in complex mental reasoning and AI assistance tasks within gridworld and household environments. Our results indicate that LLMs alone fall short; while model-based approaches boost accuracy, they suffer from high computational costs, slowness, and limitations tied to the underlying MLLM’s capacity. Conversely, MindZero strengthens the intrinsic ToM abilities of MLLMs, surpassing model-based techniques in both speed and precision. These findings demonstrate that mental reasoning can be effectively acquired as a self-supervised skill.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC