HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers
Title: HakushoBench: A Japanese Chart and Table VQA Benchmark Derived from Governmental White Papers
Abstract:
The application of vision-language models (VLMs) to real-world document comprehension hinges on the ability to interpret charts and table images. Although English-language benchmarks have seen rapid advancement, resources for non-English languages are limited, raising questions about whether recent progress translates effectively across different linguistic contexts. A major hurdle in this area is the challenge of gathering large-scale, realistic, and diverse non-English chart and table imagery. To overcome this, we propose using governmental white papers as a scalable resource for building benchmarks beyond English. These documents offer freely accessible, naturally occurring charts and tables across various domains and formats in numerous countries.
In our initial implementation, we present HakushoBench, a rigorous Japanese chart and table VQA benchmark constructed from 33 governmental white papers. The dataset comprises 2,053 images covering more than 10 distinct image types, accompanied by manually annotated question-and-answer pairs. It is specifically designed to evaluate deep, holistic comprehension of visual data, moving beyond reliance on superficial local visual cues. Our experiments with a wide array of VLMs reveal that HakushoBench continues to pose significant challenges for open-weight models. The top-performing open-weight model attained an accuracy of just 58.6%, while a disparity of 34.9 points between open-weight and proprietary models underscores substantial opportunities for advancement in complex chart and table understanding. We have made both the dataset and the associated code publicly available.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





