Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents
Title: Prioritize Perception Over Reasoning: An Efficient and Reliable Pre-Reasoning Framework for Proactive Mobile Agents
Abstract:
While multimodal large language models (MLLMs) have significantly propelled the development of mobile agents, the challenge of proactive mobile assistance persists. This difficulty stems from the necessity for agents to determine when to intervene prior to establishing how to provide help. Current systems typically consolidate these two distinct decisions into a single MLLM-driven pipeline. Consequently, this approach often results in goal misalignment—creating friction between conservative filtering for interventions and the generation of comprehensive assistance—as well as unnecessary computational overhead when the agent ought to stay silent.
To overcome these drawbacks, we introduce the Pre-Reasoning Perception Framework (PRPF), a two-stage architecture grounded in the principle of perceiving before reasoning. PRPF employs a lightweight Multimodal Proactive Perceptor (MPP) to handle context compression and act as a gate for interventions. The Proactive Agent Reasoner (PAR) is engaged solely when an intervention is deemed necessary. Evaluations on the ProactiveMobile benchmark demonstrate that PRPF significantly lowers false trigger rates (FTR) while enhancing both success rates (SR) and inference efficiency compared to the ProactiveMobile baseline.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



