Understanding the Effects of Distractors on Reasoning Vision-Language Models
Title: Analyzing the Impact of Distractors on Reasoning in Vision-Language Models
Abstract: This study explores the influence of irrelevant information, or distractors, on the test-time scaling capabilities of vision-language models (VLMs). While previous research on text-only language models indicates that textual distractors exacerbate inverse scalingāleading to extended but less effective reasoning chainsāthis work examines if comparable dynamics occur in multimodal environments. To facilitate this investigation, we present Idis (Images with distractors), a new dataset for visual question-answering that manipulates distractors across both semantic and numerical axes. Our findings demonstrate that visual distractors impact reasoning VLMs through a mechanism distinct from that of textual distractors: while inverse scaling persists, visual distractors diminish accuracy without extending the length of the reasoning process. Additionally, we highlight that attribute counts derived from reasoning traces offer critical understanding of how distractors correlate with both accuracy and reasoning duration. Finally, as a validation step, we introduce a straightforward prompting technique designed to reduce predictions driven by distractors in reasoning-based vision-language models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




