Global News Digest

arXiv

VESTA: Visual Exploration with Statistical Tool Agents

Title: VESTA: Visual Exploration with Statistical Tool Agents

Abstract

While integrating quantitative models into data is a pivotal component of scientific research, it remains one of the least automated processes. Although recent agent-based frameworks utilize language and vision-language models (VLMs) to iteratively suggest and improve statistical models, these approaches often falter when faced with complex modeling challenges. To overcome these constraints, we present VESTA (Visual Exploration with Statistical Tool Agents), a novel framework that empowers VLMs with an expanding toolkit. This system guides model refinement by leveraging data transformations, hypothesis-driven visualizations, and rigorous statistical tests.

In contrast to previous systems that depend solely on iterative critique, VESTA proactively explores the data both prior to and during the refinement phase. It achieves this by selecting or generating diagnostic tools, which are added to the model’s context for potential future reuse. We assess VESTA against established baselines across three distinct toolkit configurations: a setup with no tools, one utilizing static expert-written tools, and one employing dynamic, model-generated tools.

To facilitate this assessment, we introduce DAWN (Dataset for Automated Workflows and Numerical Modeling), a benchmark focused on distribution fitting and time series modeling. This dataset features varying difficulty levels, culminating in real-world astronomical tasks such as modeling initial mass functions and gravitational-wave chirp signals. Our results demonstrate that VESTA’s dynamic tool creation surpasses existing agentic pipelines, delivering the most significant improvements on complex and domain-specific tasks. Furthermore, we show that the dynamically generated tools are considerably more sophisticated than those from current visual tool-creation systems, offering a broader range of diagnostic categories per function and prioritizing visual outputs that VLM critics can interpret directly.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.