arXiv

ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents

Title: ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents

Abstract: Although tool-augmented vision-language agents can leverage external perceptual evidence via techniques such as OCR, detection, and segmentation, executing every suggested tool call is often expensive and redundant. This paper investigates the pre-call control challenge: specifically, whether a perceptual tool call proposed by a ReAct-style VLM agent should be executed or skipped before its results are incorporated into the context. Our evaluation across five benchmarks reveals that baseline agents suffer from poor local selectivity, with helpful and harmful calls occurring at comparable rates (11.8% versus 9.9%), and the majority of calls failing to alter the immediate forced-answer prediction. To address this, we propose ToolGate, a lightweight external controller that determines execute or skip decisions based on trajectory text and basic structural features. Utilizing two Qwen3-VL backbones, ToolGate cuts token costs to between 64% and 69% of the unrestricted ReAct baseline, while maintaining average accuracy in cross-domain scenarios. Furthermore, when trained on matched-domain trajectories with Qwen3-VL-30B, it boosts average accuracy by an additional 1.65 points. These findings demonstrate that tool-augmented VLM agents gain significant advantages not just from enhanced perceptual tools, but also from explicit mechanisms to control when tool outputs justify their cost.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...