arXiv

B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation

Title: B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation

Abstract: Pixel-level scene understanding, a core component of computer vision, relies heavily on segmentation, which supports critical applications such as medical image analysis and autonomous perception. In the realm of complex referring segmentation, contemporary approaches typically combine large vision-language models with segmentation decoders. In this setup, the language model processes the image and prompt, while the decoder generates the target mask. While reinforcement learning has proven effective for enhancing reasoning-capable vision-language systems, the optimization of trainable components like segmentation decoders usually relies on separate, differentiable objectives. The theoretical integration of these objectives into reinforcement learning frameworks remains largely unexamined. To address this, we propose Group Relative Tool Optimization (GRTO), a rigorous mathematical framework designed to jointly optimize a policy alongside differentiable tool usage. GRTO leverages rollouts from Group Relative Policy Optimization (GRPO) to refine the auxiliary tool objective, allowing gradients from the decoder to enhance policy rewards. Additionally, we introduce Bootstrapped-GRTO (B-GRTO), a cost-effective pre-training strategy that accelerates tool bootstrapping, resulting in quicker convergence and enhanced performance. Evaluations across three demanding referring segmentation benchmarks show that B-GRTO significantly outperforms standard GRPO, achieving results that are comparable to or better than current state-of-the-art methods tailored to specific domains. These findings highlight the benefits of integrating reinforcement learning with differentiable auxiliary objectives for segmentation tasks that require intensive reasoning.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...