To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs
Title: Vision or Validation? Exposing Visual Sycophancy and Fragmented Beliefs in VLMs
Abstract:
Do Vision-Language Models (VLMs) truly depend on visual data when they provide correct answers? To investigate this, we propose a Tri-Layer Diagnostic Framework equipped with three per-sample metrics: Latent Anomaly Detection, Visual Necessity Score, and Competition Score. These tools effectively separate failures in perception, dependency, and alignment. Our evaluation across nine VLMs and 9,000 model-sample pairs—subjected to counterfactual blind, noise, and conflict interventions—reveals that 72.9% of cases display "Visual Sycophancy." This phenomenon, characterized by a "Split Beliefs" pattern, occurs when internal visual evidence is retained, yet the model decodes a hallucinated response. Conversely, not a single sample exhibited "Robust Refusal," suggesting that current alignment training has effectively eradicated refusal as a possible decoding output. Furthermore, scaling experiments within the Qwen-VL family demonstrate that while increasing scale reduces reliance on language shortcuts both within and across generations, it simultaneously intensifies Visual Sycophancy. This indicates that scaling and advanced post-training techniques alone are insufficient to solve the grounding challenge. Finally, our diagnostic scores facilitate a training-free selective-prediction approach, which can boost accuracy by up to 9.5 percentage points while maintaining 50% coverage.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






