arXiv

Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy

Title: Decoupling Visual Accuracy and Factual Knowledge in Large Vision-Language Models’ Visualization Competence

Abstract

While Large Vision-Language Models (LVLMs) demonstrate robust capabilities in interpreting visualizations, it remains uncertain whether their outputs stem from authentic reasoning based on visual cues or from factual priors acquired during training. Existing evaluation methods often conflate these two factors, masking instances where memorized knowledge overrides accurate visual interpretation. To address this, we introduce a framework designed to separate visual correctness from factual correctness, thereby exposing the limitations of current visualization literacy assessments.

Our study involves 15 state-of-the-art LVLMs across three key experiments:

  1. Standard benchmarks like VLAT show some models achieving human-level performance; however, this success may be driven by factual recall rather than true visual comprehension. Conversely, tests using randomized data, such as reVLAT, tend to underestimate literacy when correct visual analysis is overshadowed by entrenched factual priors.
  2. We developed the Counterfactual Visualization Literacy Assessment Test (CVLAT), which employs capability-normalized arbitration metrics to categorize models based on their Visual-Factual Reliance Index (VFRI). This analysis identifies a majority of models as visualization-oriented and a minority as fact-knowledge-oriented, though several models hovering near zero require careful consideration. Additionally, a human baseline study (N=30) on identical counterfactual items demonstrated that humans predominantly prioritize chart data over conflicting facts, establishing a human reference standard.
  3. While prompt-based interventions can alter model prioritization, their efficacy varies significantly depending on the specific model and exhibits direction asymmetry. Notably, high proficiency in chart reading does not necessarily correlate with controllability via prompting.

In summary, high visualization accuracy alone does not guarantee faithful visual reasoning. For LVLMs to be reliably integrated into visual analytics systems, it is essential to evaluate not just their visualization literacy, but also their ability to arbitrate between visual evidence and factual priors when these sources conflict.

Benchmark and code: https://github.com/JaeyoungKim-HCIL/CVLAT


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...