TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages
Title: TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages
Abstract:
The safety assessment of Large Language Models (LLMs) continues to be predominantly focused on English, resulting in a significant gap in the examination of Low-Resource Languages (LRLs), especially those spoken in Africa. To address this disparity, we present TUKABENCH, a novel jailbreak benchmark designed for seven African languages. This framework expands upon JailbreakBench (JBB) by moving beyond simple direct translation, employing four distinct methodological settings: human-translated JBB prompts; English prompts adapted to African cultural contexts before human translation; human-curated prompts verified via interactions with GPT-5.2; and code-switched prompts that blend English with African languages. These varied approaches allow for an isolated analysis of how language, cultural context, and prompt evasion tactics influence model safety.
Our findings indicate that across both closed and open-source models, using African languages for prompting decreases the rate of refusal compared to English, with culturally adapted prompts yielding the lowest refusal rates. The evaluation further highlights two structural deficiencies: failures in model comprehension and diminished reliability when using LLMs as judges in LRL contexts. To account for comprehension issues, we introduce the metric "Deflection" in addition to "Refused" and "Jailbroken." To address the issue of judge reliability, we utilized human annotations to validate model outputs, revealing that agreement between automated judges and humans declines in lower-resource languages and for less commonly supported scripts.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




