arXiv

MindClaw: Closed-Loop Embodied Mental-State Reasoning for Precision Intervention

June 2, 2026 · Ruoxuan Zhang, Qiaoqiao Wan, Zhengguang Wang, Chenghao Yu, Hongxia Xie, Jianlong Fu, Wen-Huang Cheng · Original Source

Title: MindClaw: Closed-Loop Embodied Mental-State Reasoning for Precision Intervention

Abstract:

Theory of Mind (ToM) empowers agents to interpret the beliefs, goals, and intentions of other actors, a capability that is fundamental to providing human-centered embodied assistance. While current ToM benchmarks have significantly improved text and multimodal mental-state recognition, they predominantly assess offline question answering or the prediction of final actions. These evaluations fail to adequately test whether an embodied agent can maintain connectivity with a dynamic environment, update actor-specific beliefs, determine the necessity of reasoning, and intervene exclusively when assistance is beneficial.

Building upon MindPower, we expand robot-centric ToM reasoning into a real-time closed-loop context and present MindClaw, a framework designed for embodied mental-state reasoning with precision intervention. MindClaw integrates multi-source inputs, belief memory, an embodied cognitive trigger skill, mental reasoning, and action generation. This architecture enables the agent to produce helpful actions at appropriate moments while remaining silent when intervention is not required. Our experiments reveal that direct Vision-Language Model (VLM) baselines face difficulties with task awareness and intervention calibration. In contrast, MindClaw delivers superior overall performance, underscoring the critical role of trigger-skill optimization in achieving effective closed-loop embodied ToM assistance.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC