Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy
Title: Decoupling Visual Accuracy and Factual Knowledge in Large Vision-Language Models’ Visualization Competence
Abstract
While Large Vision-Language Models (LVLMs) demonstrate robust capabilities in interpreting visualizations, it remains uncertain whether their outputs stem from authentic reasoning based on visual cues or from factual priors acquired during training. Existing evaluation methods often conflate these two factors, masking instances where memorized knowledge overrides accurate visual interpretation. To address this, we introduce a framework designed to separate visual correctness from factual correctness, thereby exposing the limitations of current visualization literacy assessments.
Our study involves 15 state-of-the-art LVLMs across three key experiments:
- Standard benchmarks like VLAT show some models achieving human-level performance; however, this success may be driven by factual recall rather than true visual comprehension. Conversely, tests using randomized data, such as reVLAT, tend to underestimate literacy when correct visual analysis is overshadowed by entrenched factual priors.
- We developed the Counterfactual Visualization Literacy Assessment Test (CVLAT), which employs capability-normalized arbitration metrics to categorize models based on their Visual-Factual Reliance Index (VFRI). This analysis identifies a majority of models as visualization-oriented and a minority as fact-knowledge-oriented, though several models hovering near zero require careful consideration. Additionally, a human baseline study (N=30) on identical counterfactual items demonstrated that humans predominantly prioritize chart data over conflicting facts, establishing a human reference standard.
- While prompt-based interventions can alter model prioritization, their efficacy varies significantly depending on the specific model and exhibits direction asymmetry. Notably, high proficiency in chart reading does not necessarily correlate with controllability via prompting.
In summary, high visualization accuracy alone does not guarantee faithful visual reasoning. For LVLMs to be reliably integrated into visual analytics systems, it is essential to evaluate not just their visualization literacy, but also their ability to arbitrate between visual evidence and factual priors when these sources conflict.
Benchmark and code: https://github.com/JaeyoungKim-HCIL/CVLAT
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





