Global News Digest

arXiv

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

Title: BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

Abstract

Although Bengali ranks as the sixth most widely spoken language globally, there has been no systematic prior research assessing hallucination within large language models (LLMs) for this language. To address this gap, we present BenHalluEval, a granular evaluation framework designed specifically for Bengali. This framework encompasses four distinct tasks: Generative Question Answering (GQA), Bangla-English Code-Mixed QA, Summarization, and Reasoning.

Our methodology involves the generation of 12,000 hallucinated samples using GPT-5.4. These samples cover twelve task-specific hallucination types and are derived from three existing Bengali datasets. We assessed seven LLMs, categorized as reasoning-oriented, multilingual, and Bengali-centric, utilizing a dual-track protocol. This protocol separately measures the false-positive rate on ground-truth instances (Track A) and the hallucination detection rate on the generated candidates (Track B).

To penalize both failure modes simultaneously and avoid score inflation caused by uniform response bias, we introduce BenHalluScore. This dual-track calibration metric yields scores ranging from 7.72% to 55.42% across the evaluated models and tasks, exposing significant disparities in hallucination calibration. While chain-of-thought prompting was employed as a mitigation strategy, it altered response distributions without consistently enhancing the models' ability to discriminate hallucinations. BenHalluEval marks the creation of the first dedicated hallucination benchmark for Bengali, underscoring the limitations of single-track evaluations and reliance on prompting alone in low-resource language contexts. The associated dataset and code are accessible at https://anonymous.4open.science/r/BanglaHalluEval-EB77.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.