Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning
Title: Reevaluating the Utility of Tools: Adaptive Invocation for Dual-Mode Multimodal LLM Reasoning
Abstract: While tool-augmented reasoning represents a promising avenue for strengthening the inferential powers of multimodal large language models (MLLMs), current research predominantly concentrates on the mechanics of tool invocation, often overlooking the critical question of when tools are actually necessary. We posit that relying on tools is not universally advantageous; superfluous or ill-suited invocations can significantly inflate reasoning costs and potentially distort model predictions. To mitigate these challenges, we present AutoTool, a framework that dynamically determines the necessity of tool usage based on specific query attributes. Operating within a reinforcement learning paradigm, AutoTool employs an explicit dual-mode reasoning strategy, utilizing distinct reward functions for each mode to steer the model toward precise outputs. Furthermore, to avoid early convergence on a single reasoning approach, the system simultaneously explores and balances tool-assisted and text-centric reasoning during training, encouraging broader exploration in subsequent phases. Comprehensive evaluations confirm that AutoTool achieves superior performance and efficiency, securing a 21.8% accuracy increase on the V* benchmark relative to the baseline model, and delivering a 44.9% efficiency boost over current tool-augmented methods on the POPE benchmark. The codebase is accessible at https://github.com/MQinghe/AutoTool.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




