arXiv

CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs

Title: CultureForest: Assessing and Interpreting Cultural Norm Grounded Reasoning in LLMs

Abstract: Current studies predominantly frame cultural intelligence in Large Language Models (LLMs) as a matter of factual knowledge, often neglecting the critical question of whether models can effectively apply this information in practical contexts. To address this oversight, we present CultureForest, a novel benchmark designed for \textit{Cultural Norm Grounded Reasoning}. This framework anchors each query in a concise set of atomic norms, facilitating evaluations that are both verifiable and attributable. The benchmark contains 5,378 instances spanning 53 countries or regions and covering eight distinct domains, allowing for a tiered assessment that ranges from multiple-choice questions to open-ended generation tasks.

Our comprehensive experiments demonstrate that even leading models suffer significant performance declines in open-ended scenarios, with notable disparities observed across different regions. Detailed analysis reveals several consistent trends: first, employing reasoning at test time offers minimal improvements and can actually widen existing inequalities; second, models display highly similar regional preference structures; third, model outputs tend to be notably conservative, particularly when subjected to stricter cultural constraints; and fourth, by separating the acquisition of cultural knowledge from the act of cultural reasoning, we demonstrate that while LLMs hold considerable cultural knowledge, their effectiveness is hindered by their inability to utilize it efficiently. These results underscore the need to shift evaluation focus from mere knowledge retention to the measurement of knowledge-grounded reasoning.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...