arXiv

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

June 4, 2026 · Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari, Yasamin Medghalchi, Ilker Hacihaliloglu, Mesrob Ohannessian, Lele Wang, Giuseppe Carenini · Original Source

Title: VAMPS: A Benchmark for Visual-Assisted Mathematical Problem Solving

Abstract

While multimodal large language models are demonstrating growing proficiency in complex reasoning, their effectiveness frequently diminishes when they must externalize a problem via a tool and subsequently analyze the tool’s output, particularly when visual aids are involved. This limitation is significant, as engineering and scientific processes heavily depend on visualization tools for analysis, validation, and decision-making. To investigate this discrepancy, we present VAMPS (Visual-Assisted Mathematical Problem Solving), a new benchmark focused on graph-assisted mathematics.

The VAMPS dataset comprises 1,168 bilingual, multimodal multiple-choice question-answer pairs. These items are sourced from algebra and calculus problems in the Iranian University Entrance Exam and are supplemented with synthetic variants generated by LLMs and reviewed by humans. Each item was specifically selected to ensure that plotting offers a natural solution path, allowing users to identify intersections, extrema, asymptotes, and other key features.

Designed to serve both as a benchmarking tool and a diagnostic instrument, VAMPS extends beyond previous multimodal evaluations that primarily assessed reasoning over static visual inputs. Instead, it tests whether models can leverage the construction of useful graphs and ground their answers in the resulting visualizations. Our results indicate that, across a diverse range of models, direct analytical solving surprisingly surpasses tool-enabled visual solving, even in scenarios where plotting is an intuitive strategy.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC