arXiv

Can Generalist Agents Automate Data Curation?

Title: Can Generalist Agents Automate Data Curation?

Abstract

Curating training data represents one of the most critical and labor-intensive phases of contemporary AI development. Practitioners typically engage in an iterative cycle of proposing, implementing, evaluating, and refining data policies in response to noisy feedback from benchmarks. This study investigates whether generalist coding agents can automate this data-curation workflow. To this end, we present Curation-Bench, a benchmark designed with an agent-centric approach. This framework keeps the model, training recipe, and evaluation suite constant, while granting agents command-line access to inspect data, enact policies, submit them to a fixed training and evaluation pipeline, and subsequently revise their approach.

In experiments involving vision-language instruction tuning, out-of-the-box agents achieved performance comparable to strong published data-selection baselines within just ten iterations. However, an analysis of agent trajectories highlights a persistent "execution-research gap": agents primarily adjusted local policy variants rather than exploring new policy families, even when provided with strategic guides and references to existing papers. By introducing scaffolds that require agents to cite, instantiate, and adapt prior methods at each iteration, we shifted the agents toward method-guided exploration. The resulting scaffolded agent autonomously composed a data-selection policy—without any human design input—that surpassed strong published baselines while using only one-tenth of their data budget. Ultimately, while current agents can execute the curation loop, reliable data research necessitates scaffolded method adaptation rather than relying solely on open-ended prompting. The code and benchmark are publicly available.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.