arXiv

Global PIQA: Evaluating Commonsense Reasoning Across 100+ Languages and Cultures

Title: Global PIQA: Assessing Commonsense Reasoning Across More Than 100 Languages and Cultural Contexts

Abstract:

Currently, there is a significant lack of culturally specific evaluation benchmarks for large language models (LLMs) that encompass a broad spectrum of languages and cultures. To address this gap, this paper introduces Global PIQA, a participatory benchmark designed to test commonsense reasoning across more than 100 languages. The dataset was meticulously curated by over 350 researchers hailing from 65 different countries. Global PIQA includes 141 distinct language varieties, spanning five continents, 19 language families, and 24 unique writing systems.

The benchmark features two distinct splits. In the non-parallel split, more than half of the examples incorporate locally relevant elements, such as regional foods, customs, traditions, and other culture-specific details. Conversely, the parallel split involves translating "culturally agnostic" commonsense reasoning questions into 131 language varieties, enabling direct cross-lingual comparisons. Crucially, all examples in both splits have been validated by native speakers of the respective languages.

Our analysis reveals that while state-of-the-art LLMs demonstrate strong aggregate performance on Global PIQA, they struggle significantly with lower-resource languages. For instance, in the parallel split, we observed accuracy gaps of up to 68% between different languages. These findings underscore that everyday knowledge remains a critical area for improvement in LLMs for many cultures, paralleling existing concerns regarding complex reasoning and expert knowledge. Beyond serving as a tool for LLM evaluation, Global PIQA offers valuable insights into the rich diversity of cultures in which human language is situated.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...