GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning
Title: GRPO-TTA: Enhancing Vision-Language Model Adaptation at Test Time through GRPO-Based Reinforcement Learning
Abstract:
The recent success of Group Relative Policy Optimization (GRPO) in the post-training phase of large language and vision-language models has sparked interest in its potential for test-time adaptation (TTA). This study investigates whether GRPO can significantly enhance the TTA capabilities of vision-language models. To address this, we introduce GRPO-TTA, a novel framework that adapts GRPO to the TTA context by redefining class-specific prompt prediction as a group-wise policy optimization task.
Our approach generates output groups by sampling the top-K class candidates from CLIP similarity distributions. This mechanism facilitates probability-driven optimization without requiring ground-truth labels. Furthermore, we have developed specialized reward functions designed for test-time adaptation, such as dispersion and alignment rewards, to steer the tuning of the visual encoder effectively. Comprehensive evaluations across a variety of benchmarks reveal that GRPO-TTA consistently surpasses current TTA methods, achieving particularly substantial performance improvements when facing natural distribution shifts.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





