Detect Before You Leap: Mirage Detection in Vision-Language Models
Title: Prioritize Detection Before Generation: Identifying Mirages in Vision-Language Models
Abstract:
Vision-language models (VLMs) frequently generate high-confidence visual answers even in the absence of necessary visual evidence, when the input is blank, or when the imagery is irrelevant to the query. This specific failure mode, termed "mirage" (Asadi et al. 2026), poses significant risks in specialized fields such as medical imaging and document visual question answering, where responses that appear plausible but lack visual grounding may be erroneously accepted as evidence-based.
This study focuses on pre-release mirage detection. The objective is to evaluate, prior to response generation, whether a VLM should provide an answer or abstain based on a given image-question pair. To achieve this, we introduce Text-Conditioned Layer-wise Internal Alignment (TC-LIA), a model-agnostic technique that examines patch-token representations throughout the layers of a CLIP ViT-H/14 vision encoder. TC-LIA works by projecting image patch tokens from each layer into the final CLIP embedding space and calculating their similarity to the question embedding. This process enables the tracking of whether visual evidence relevant to the question becomes apparent across the vision network’s layers.
To summarize the resulting alignment trajectory, the method employs several metrics: final image-text cosine similarity, top-k patch-text alignment in late layers, the gain from early to late layers, and layer-wise slopes. These features are integrated into an ensemble that also incorporates pixel-statistic detection for blanks or noise, zero-shot domain routing, and structured VLM self-assessment.
Evaluated across five VQA domains, three input conditions, and twelve VLM backbones, the most effective systems achieved three-class detection accuracy of approximately 94.6–94.7%, with mirage rates kept below 3%. In contrast, baseline methods exhibited mirage rates ranging from 21.7% to 66.6%.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




