FLaG: Fine-Grained Latent Grouping for Hallucination Detection
Title: FLaG: Fine-Grained Latent Grouping for Hallucination Detection
Original: arXiv:2606.00301v1 Announce Type: new
Abstract: Detecting hallucinations in large language models (LLMs) is a complex challenge because these errors stem from diverse failure mechanisms, rendering single global uncertainty scores unreliable. To address this, we reframe hallucination detection as a mechanism-aware evidence aggregation task. This approach interprets various representation- and token-level signals through the lens of multiple latent explanations. We introduce FLaG, a lightweight framework that assesses correctness by modeling it across a series of latent evidence groups. Using an energy-based routing mechanism, each instance is softly linked to several groups, and their group-conditional reliability signals are integrated via a principled log-marginal aggregation. This architecture allows FLaG to accommodate varied hallucination patterns while maintaining invariance to specific decision thresholds and evaluation metrics. Notably, the framework functions as a frozen-model head, requiring no alterations to the base language model and adding negligible computational cost. From a theoretical standpoint, we demonstrate that FLaG aligns with optimal evidence aggregation under heterogeneous error mechanisms; specifically, the Bayes-optimal test statistic inherently follows a log-marginal form, with FLaG serving as a tractable approximation featuring a controllable error bound. Our extensive experiments, spanning multiple benchmarks and LLM backbones, show that FLaG consistently delivers state-of-the-art performance. It also proves robust in transferring across different datasets and models, maintaining effectiveness even under conditions of limited supervision.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





