arXiv

ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats

Title: ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats

Abstract

While charts serve as a fundamental tool for communicating quantitative and relational data, the systematic assessment of chart parsing models continues to present significant challenges. Current benchmarks are often restricted to limited chart types, neglecting diagrammatic structures like mind maps and flowcharts. Furthermore, existing models generate outputs in incompatible formats, and datasets frequently fail to incorporate real-world variations such as printed or hand-drawn images.

To resolve these limitations, we present ChartArena, a robust bilingual benchmark that encompasses eight distinct chart families, bridging both numeric charts and diagrammatic structures. Each category is assessed across three visual contexts: digital renderings, photographs of printed documents, and images of hand-drawn sketches. The dataset’s reliability is ensured through a human-agent collaborative annotation pipeline featuring multi-stage human verification.

To facilitate equitable comparisons between different models, we have developed a format-agnostic evaluation protocol. This system translates heterogeneous model outputs into two standardized semantic spaces—a normalized triple view and a directed graph view—allowing for scoring via structure-aware metrics.

Our extensive evaluation of 26 state-of-the-art Multimodal Large Language Models (MLLMs) yielded three key insights: 1. Leading proprietary models, such as Gemini 3.1 Pro, dominate overall performance, although top-tier open-source systems are quickly narrowing the performance gap. 2. While document parsing models perform adequately with numeric charts, their capabilities drop significantly when handling diagrammatic structures. 3. Specialized expert chart parsers remain confined to specific, narrow chart families.

Radar charts and hand-drawn scenarios emerged as particularly difficult challenges across all tested models. These results highlight distinct capability gaps within current technology and establish ChartArena as a unified foundation for advancing future research. The ChartArena dataset and resources are publicly accessible at https://github.com/pspdada/ChartArena.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...