Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection
Title: Beyond Entropy: Enhancing Visual Reasoning in Reinforcement Learning through Vision-Anchored Token Selection
Abstract:
Although token-level entropy is widely regarded as a robust mechanism for credit assignment in text-based reinforcement learning with verifiable rewards (RLVR), its efficacy in the domain of visual reasoning remains uncertain. Our controlled experiments reveal that this approach fails in visual contexts because it neglects vision-sensitive tokens, which inherently possess low entropy. While contemporary multimodal RL approaches increasingly recognize the value of visual perception, they often fail to adequately bridge the gap between precise perceptual grounding and semantic reasoning. These methods either lack systematic visual metrics or overlook the fact that token entropy primarily facilitates semantic exploration. To resolve these issues, we present VEPO (Vision-Entropy token-selection for Policy Optimization), a novel RL framework that effectively combines visual sensitivity with token entropy through a principled multiplicative coupling. This mechanism directs gradient credit specifically to tokens that are both visually grounded and highly informative. Comprehensive experiments highlight VEPO’s superior performance, surpassing the entropy-only baseline by 2.28 points at the 7B scale and 3.15 points at the 3B scale. Furthermore, ablation studies confirm the validity and soundness of our proposed method.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



