GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving
Title: GeoDrive-Bench: Evaluating Geographically Tailored Multimodal Reasoning for Self-Driving Systems
Abstract:
While vision-language models (VLMs) have demonstrated significant potential in the realm of autonomous driving, their capacity to navigate region-specific traffic regulations remains largely unexamined. This gap creates uncertainty regarding their viability for deployment in varied global markets. To address this, we present GeoDrive-Bench, a new benchmark designed to systematically assess the geo-culturally grounded driving reasoning capabilities of VLMs. Our dataset comprises 5,053 multiple-choice question-and-answer pairs, all verified by humans, spanning six distinct countries to reflect a wide array of driving cultures. The benchmark focuses on four core driving competencies: perception, prediction, planning, and region-specific reasoning. Crucially, each query demands that models deduce appropriate driving actions based on visual cues and local traffic norms, without being provided with explicit country identifiers. In addition to serving as an evaluation tool, we developed a distillation algorithm designed to embed region-specific traffic rule knowledge into the internal representations of VLMs. This approach helps models better synchronize their understanding of visual scenes with local driving policies. Our experiments, conducted on nine leading state-of-the-art VLMs, revealed significant performance disparities across different geo-driving cultures for every task. Conversely, our proposed baseline models demonstrated enhanced geo-cultural reasoning across these regions. These findings indicate that current VLMs still lack robust, region-aware driving intelligence, positioning GeoDrive-Bench as a critical diagnostic and training-oriented platform for developing deployable autonomous driving foundation models.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





