arXiv

HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

Title: HakushoBench: A Japanese Chart and Table VQA Benchmark Derived from Governmental White Papers

Abstract:

The application of vision-language models (VLMs) to real-world document comprehension hinges on the ability to interpret charts and table images. Although English-language benchmarks have seen rapid advancement, resources for non-English languages are limited, raising questions about whether recent progress translates effectively across different linguistic contexts. A major hurdle in this area is the challenge of gathering large-scale, realistic, and diverse non-English chart and table imagery. To overcome this, we propose using governmental white papers as a scalable resource for building benchmarks beyond English. These documents offer freely accessible, naturally occurring charts and tables across various domains and formats in numerous countries.

In our initial implementation, we present HakushoBench, a rigorous Japanese chart and table VQA benchmark constructed from 33 governmental white papers. The dataset comprises 2,053 images covering more than 10 distinct image types, accompanied by manually annotated question-and-answer pairs. It is specifically designed to evaluate deep, holistic comprehension of visual data, moving beyond reliance on superficial local visual cues. Our experiments with a wide array of VLMs reveal that HakushoBench continues to pose significant challenges for open-weight models. The top-performing open-weight model attained an accuracy of just 58.6%, while a disparity of 34.9 points between open-weight and proprietary models underscores substantial opportunities for advancement in complex chart and table understanding. We have made both the dataset and the associated code publicly available.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...