Herculean: An Agentic Benchmark for Financial Intelligence
Title: Herculean: A Benchmark for Agentic Financial Intelligence
Abstract:
As artificial intelligence agents continue to evolve, the pivotal inquiry has shifted from their capacity to handle isolated, well-defined financial tasks to their ability to consistently execute the duties of financial professionals. Current financial benchmarks provide an incomplete picture of this capability, focusing predominantly on static competencies like question answering, information retrieval, summarization, and classification. To address this limitation, we present Herculean, the inaugural skilled benchmark designed for agentic financial intelligence. This framework covers four representative workflows: Trading, Hedging, Market Insights, and Auditing.
Each workflow is implemented as a standardized skill environment based on the Model Context Protocol (MCP), featuring distinct tools, interaction dynamics, constraints, and success criteria. This structure allows for a consistent, end-to-end evaluation of heterogeneous agent systems. Our testing of frontier agents reveals a divergence in performance: while agents demonstrate relative proficiency in Trading and Market Insights, they face significant challenges in Hedging and Auditing. These latter tasks demand long-horizon coordination, state consistency, and structured verification. Ultimately, our findings highlight a critical deficiency in current agent capabilities: the inability to reliably transform financial reasoning into dependable workflow execution within high-stakes financial environments.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




