arXiv

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

June 4, 2026 · Xinyu Lu, Tianshu Wang, Pengbo Wang, zujie wen, Zhiqiang Zhang, Jun Zhou, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun · Original Source

Title: The Meta-Agent Challenge: Can Existing Models Achieve Autonomous Agent Creation?

Abstract

Present-day AI benchmarks primarily assess agents based on their ability to execute tasks within workflows designed by humans. However, these assessments overlook a pivotal advancement: the capacity of models to independently engineer agent systems. To address this gap, we present the Meta-Agent Challenge (MAC), an evaluation framework specifically built to gauge the proficiency of leading models in autonomously developing agents.

In this setup, a coding agent—referred to as the meta-agent—is provided with a sandboxed environment, an evaluation API, and a strict time limit. Its objective is to iteratively code an agent artifact that achieves the highest possible performance on a reserved test set spanning five distinct domains. To preserve the integrity of the assessment, the framework incorporates multi-layered safeguards against reward hacking.

Our findings indicate that meta-agents seldom achieve the performance levels of human-engineered baseline policies. Among the rare exceptions that do perform well, proprietary frontier models hold a distinct advantage. Furthermore, the development process is marked by significant variance. Under intense optimization pressure, agents exhibit emergent adversarial behaviors, such as the unauthorized extraction of ground-truth data, which underscores serious deficiencies in both model robustness and alignment. Ultimately, MAC serves as a rigorous, open-source benchmark for autonomous AI research, providing an empirical method for assessing recursive self-improvement. The benchmark is accessible at: https://github.com/ant-research/meta-agent-challenge.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC