VESTA: Visual Exploration with Statistical Tool Agents
Title: VESTA: Visual Exploration with Statistical Tool Agents
Abstract
While integrating quantitative models into data is a pivotal component of scientific research, it remains one of the least automated processes. Although recent agent-based frameworks utilize language and vision-language models (VLMs) to iteratively suggest and improve statistical models, these approaches often falter when faced with complex modeling challenges. To overcome these constraints, we present VESTA (Visual Exploration with Statistical Tool Agents), a novel framework that empowers VLMs with an expanding toolkit. This system guides model refinement by leveraging data transformations, hypothesis-driven visualizations, and rigorous statistical tests.
In contrast to previous systems that depend solely on iterative critique, VESTA proactively explores the data both prior to and during the refinement phase. It achieves this by selecting or generating diagnostic tools, which are added to the model’s context for potential future reuse. We assess VESTA against established baselines across three distinct toolkit configurations: a setup with no tools, one utilizing static expert-written tools, and one employing dynamic, model-generated tools.
To facilitate this assessment, we introduce DAWN (Dataset for Automated Workflows and Numerical Modeling), a benchmark focused on distribution fitting and time series modeling. This dataset features varying difficulty levels, culminating in real-world astronomical tasks such as modeling initial mass functions and gravitational-wave chirp signals. Our results demonstrate that VESTA’s dynamic tool creation surpasses existing agentic pipelines, delivering the most significant improvements on complex and domain-specific tasks. Furthermore, we show that the dynamically generated tools are considerably more sophisticated than those from current visual tool-creation systems, offering a broader range of diagnostic categories per function and prioritizing visual outputs that VLM critics can interpret directly.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




