Enhancing Hallucination Detection through Noise Injection
Title: Boosting Hallucination Detection via Noise Injection
Abstract:
Large Language Models (LLMs) frequently produce responses that sound convincing but are factually wrong, a phenomenon referred to as hallucination. Consequently, the ability to accurately identify these errors is essential for the secure implementation of LLM technologies. Current studies have connected hallucinations to model uncertainty, proposing that measuring the variance in answer distributions derived from multiple model samples can serve as an effective detection method. Although sampling from the token distribution provided by the model is an intuitive strategy for generating these samples, this study contends that such methods are not optimal for identifying hallucinations. We demonstrate that detection accuracy can be markedly enhanced by incorporating model uncertainty within a Bayesian framework. To achieve this, we introduce a straightforward, training-free technique that involves perturbing a specific selection of model parameters—or equivalently, hidden unit activations—during the sampling process. Our experiments confirm that this method substantially outperforms standard sampling techniques for inference-time hallucination detection across a wide range of datasets, model architectures, and uncertainty metrics.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




