MindZero: Learning Online Mental Reasoning With Zero Annotations
Title: MindZero: Enabling Online Mental Reasoning Without Annotations
Abstract
To provide effective assistance in real-world scenarios, AI agents must possess a robust Theory of Mind (ToM)āthe capacity to deduce human mental states from observed behavior. Although significant progress has been made, several critical hurdles persist. These include the difficulty of performing online inference with reliable uncertainty updates across multiple hypotheses, the necessity for reasoning processes that are efficient enough for real-time support, and the scarcity of ground-truth mental state data in practical applications.
We propose MindZero, a self-supervised reinforcement learning framework designed to equip multimodal large language models (MLLMs) with efficient and resilient online mental reasoning capabilities. During the training phase, the model receives rewards for producing mental state hypotheses that best explain observed actions, as calculated by a planner. This approach mirrors model-based ToM reasoning and removes the dependency on explicit mental state annotations. Once trained, MindZero condenses this model-based reasoning into a rapid, single-pass inference process.
We benchmarked MindZero against baseline methods in complex mental reasoning and AI assistance tasks within gridworld and household environments. Our results indicate that LLMs alone fall short; while model-based approaches boost accuracy, they suffer from high computational costs, slowness, and limitations tied to the underlying MLLMās capacity. Conversely, MindZero strengthens the intrinsic ToM abilities of MLLMs, surpassing model-based techniques in both speed and precision. These findings demonstrate that mental reasoning can be effectively acquired as a self-supervised skill.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




