EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams
Title: EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams
Abstract: Introducing EuraGovExam, a novel benchmark derived from authentic civil service examinations across five key Eurasian jurisdictions: the European Union, India, Japan, South Korea, and Taiwan. This resource captures the genuine intricacy of public-sector testing, comprising more than 8,000 high-resolution scanned multiple-choice questions spanning 17 distinct academic and administrative fields. In a departure from conventional benchmarks, EuraGovExam consolidates all question components—including problem descriptions, answer options, and visual cues—into single images, offering only a brief, standardized prompt regarding answer formatting. This architecture requires models to execute layout-sensitive, cross-lingual reasoning directly from visual data. Sourced exclusively from actual examination papers, the dataset retains complex visual features such as tables, multilingual typography, and form-based structures. Our evaluations reveal that even leading vision-language models (VLMs) attain merely 86% accuracy, highlighting both the benchmark’s rigor and its utility in exposing current model shortcomings. By prioritizing cultural authenticity, visual intricacy, and linguistic variety, EuraGovExam sets a fresh benchmark for assessing VLMs in high-stakes, image-based, multilingual contexts, while also facilitating practical uses in e-governance, public document analysis, and fair exam preparation.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





