Global News Digest

arXiv

Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention

Title: Enhancing RL-Driven Visual Reasoning via Selective Adversarial Entropy Intervention

Abstract: Reinforcement learning (RL) has emerged as a prevalent strategy for bolstering the reasoning faculties of vision-language models (VLMs). Within the landscape of RL-based fine-tuning, entropy intervention has proven to be a potent mechanism for augmenting exploratory capacity, which subsequently elevates policy performance. However, while prior research largely confines entropy intervention to the manipulation of specific token updates during RL policy optimization, it frequently overlooks the potential of intervening during the RL sampling phase. Addressing this gap, we demonstrate that such intervention can significantly enhance the performance of GRPO by fostering greater response diversity. To this end, we introduce Selective-adversarial Entropy Intervention (SaEI), a method that amplifies policy entropy by perturbing visual inputs through a token-selective adversarial objective derived from the entropy of sampled responses. Our approach comprises two key components: first, we develop Entropy-guided Adversarial Sampling (EgAS), which treats the entropy of sampled responses as an adversarial objective. By leveraging the resulting adversarial gradient to attack visual inputs, EgAS generates adversarial samples that encourage the policy model to traverse a broader answer space during RL sampling. Second, we introduce Token-selective Entropy Computation (TsEC) to optimize the efficacy of the EgAS attack, ensuring that factual knowledge within VLMs remains intact. Comprehensive experiments conducted on both in-domain and out-of-domain datasets confirm that our method substantially enhances policy exploration through entropy intervention, thereby improving reasoning capabilities. The code will be made available upon acceptance.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.