arXiv

To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs

Title: Vision or Validation? Exposing Visual Sycophancy and Fragmented Beliefs in VLMs

Abstract:

Do Vision-Language Models (VLMs) truly depend on visual data when they provide correct answers? To investigate this, we propose a Tri-Layer Diagnostic Framework equipped with three per-sample metrics: Latent Anomaly Detection, Visual Necessity Score, and Competition Score. These tools effectively separate failures in perception, dependency, and alignment. Our evaluation across nine VLMs and 9,000 model-sample pairs—subjected to counterfactual blind, noise, and conflict interventions—reveals that 72.9% of cases display "Visual Sycophancy." This phenomenon, characterized by a "Split Beliefs" pattern, occurs when internal visual evidence is retained, yet the model decodes a hallucinated response. Conversely, not a single sample exhibited "Robust Refusal," suggesting that current alignment training has effectively eradicated refusal as a possible decoding output. Furthermore, scaling experiments within the Qwen-VL family demonstrate that while increasing scale reduces reliance on language shortcuts both within and across generations, it simultaneously intensifies Visual Sycophancy. This indicates that scaling and advanced post-training techniques alone are insufficient to solve the grounding challenge. Finally, our diagnostic scores facilitate a training-free selective-prediction approach, which can boost accuracy by up to 9.5 percentage points while maintaining 50% coverage.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Advantech's Tsai on Nvidia Collaboration, AI Strategy
Bloomberg

Advantech's Tsai on Nvidia Collaboration, AI Strategy

Advantech's Tsai discusses the Nvidia partnership and AI strategy.

SK Hynix to Double Wafer Capacity to Ease Memory Chip Crunch
Bloomberg

SK Hynix to Double Wafer Capacity to Ease Memory Chip Crunch

SK Hynix plans to double its wafer capacity to alleviate the ongoing global memory chip shortage. This expansion aims to...

AI Productivity Boost Is Overhyped | 3-Minute MLIV
Bloomberg

AI Productivity Boost Is Overhyped | 3-Minute MLIV

The video argues that AI’s productivity boost is overhyped, challenging the assumption that it will significantly enhanc...

Intel's Lip-Bu Tan on Agentic AI & Partner Networks
Bloomberg

Intel's Lip-Bu Tan on Agentic AI & Partner Networks

Intel’s Lip-Bu Tan discusses Agentic AI and the vital role of partner networks in driving innovation.

Haas Says Arm May Hit $15 Billion AI Chip Revenue Goal Early
Bloomberg

Haas Says Arm May Hit $15 Billion AI Chip Revenue Goal Early

Haas suggests Arm may achieve its $15 billion AI chip revenue target sooner than expected. This indicates strong market ...

Arm May Hit $15 Billion AI Chip Revenue Goal Early, CEO Says
Bloomberg

Arm May Hit $15 Billion AI Chip Revenue Goal Early, CEO Says

Arm’s CEO predicts the company could hit its $15 billion AI chip revenue target ahead of schedule. This optimistic outlo...