arXiv

Geometry-Aware Hallucination Detection in Large Language Models

Title: Geometry-Aware Hallucination Detection in Large Language Models

Abstract:

Large language models (LLMs) are prone to producing content that is factually inaccurate or lacks support, a phenomenon widely known as hallucination. While previous research has investigated various mitigation techniques—including decoding strategies, retrieval-augmented generation, and supervised fine-tuning—recent findings highlight the significant impact of in-context learning (ICL) on factual accuracy. Despite this, current methods for selecting ICL demonstrations often depend on superficial similarity heuristics, resulting in limited robustness across different models and tasks.

To address these limitations, we introduce GA-ICL, a geometry-aware framework for sampling in-context demonstrations. This approach utilizes latent representations derived from frozen LLMs to select examples based on their proximity to learned prototypes, rather than relying solely on lexical or embedding similarity. By integrating local manifold structure with class-aware prototype geometry, GA-ICL enhances the selection process.

Our evaluations on the FEVER benchmark for factual verification and the HaluEval benchmark for hallucination detection demonstrate that GA-ICL surpasses standard ICL selection baselines in most tested scenarios. The framework shows particularly notable improvements in dialogue and summarization tasks. Furthermore, GA-ICL maintains robustness against temperature variations and differences in model architectures, suggesting greater stability compared to heuristic retrieval methods.

Although lexical retrieval can still perform competitively in certain question-answering contexts for smaller models, our findings indicate that geometry-aware prototype selection offers a reliable, training-efficient solution for hallucination detection that does not require modifying LLM parameters. Extended tests on larger models, specifically Phi-14B and Qwen3-32B, confirm that GA-ICL scales effectively. It outperforms all compared baselines, including in question-answering tasks where smaller models exhibit limitations at boundary conditions, thereby providing a principled path forward for improving ICL demonstration selection.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...