Global News Digest

arXiv

Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents

Title: Prioritizing Diversity Over Frequency: Reevaluating Tool Utilization in Visual Chain-of-Thought Agents

Abstract:

Visual agents leverage external visual tools within their visual chains of thought to embed detailed evidence. However, while existing literature has predominantly examined these tools in the context of visual search, their function in more intricate visual reasoning scenarios remains largely unexplored. This study shifts focus from basic visual search to more demanding tasks, such as 3D spatial reasoning and medical visual question answering (VQA). In these contexts, agents are required to synthesize local evidence obtained through tools with broader global contexts.

We identify a "tool-use collapse phenomenon," wherein models gradually cease utilizing tools even as their task accuracy improves. Furthermore, we note a distinct asymmetry in performance: (i) removing tool usage entirely leads to a decline in performance, while (ii) encouraging tool use results in only slight performance improvements, despite a significant increase in the frequency of tool invocation. Our analysis reveals that both standard training methods and incentives for tool usage tend to reduce the diversity of rollout trajectories. This reduction in diversity explains why increased tool usage does not necessarily translate to enhanced reasoning capabilities.

Based on these insights, we introduce an entropy regularization term designed to foster more diverse exploration during rollouts. This approach achieves superior performance, even though the frequency of tool usage continues to decrease. We also observe similar dynamics in medical VQA, indicating that tool-use collapse extends beyond 3D spatial reasoning. Ultimately, our results suggest viewing tools as scaffolding during training; promoting broader exploration across both language generation and visual tool invocation enhances reasoning capabilities, notwithstanding the observed collapse in tool usage.

Project page: https://scaffolded-exploration.github.io


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.