arXiv

Understanding the Effects of Distractors on Reasoning Vision-Language Models

June 2, 2026 · Jiyun Bae, Hyunjong Ok, Sangwoo Mo, Jaeho Lee · Original Source

Title: Analyzing the Impact of Distractors on Reasoning in Vision-Language Models

Abstract: This study explores the influence of irrelevant information, or distractors, on the test-time scaling capabilities of vision-language models (VLMs). While previous research on text-only language models indicates that textual distractors exacerbate inverse scaling—leading to extended but less effective reasoning chains—this work examines if comparable dynamics occur in multimodal environments. To facilitate this investigation, we present Idis (Images with distractors), a new dataset for visual question-answering that manipulates distractors across both semantic and numerical axes. Our findings demonstrate that visual distractors impact reasoning VLMs through a mechanism distinct from that of textual distractors: while inverse scaling persists, visual distractors diminish accuracy without extending the length of the reasoning process. Additionally, we highlight that attribute counts derived from reasoning traces offer critical understanding of how distractors correlate with both accuracy and reasoning duration. Finally, as a validation step, we introduce a straightforward prompting technique designed to reduce predictions driven by distractors in reasoning-based vision-language models.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC