Relational Linearity is a Predictor of Hallucinations
Title: Relational Linearity Predicts Model Hallucinations
Abstract:
Hallucination represents a primary failure mode in language models (LMs). This study investigates such hallucinations in the context of queries regarding synthetic entities specifically crafted to be unfamiliar to the model, such as questions asking which instrument a fictional character like "Glenn Gould" played. Our analysis reveals that instruction-tuned models, including Gemma-7B-IT, frequently hallucinate; notably, they struggle to identify that the generated facts lie outside their actual knowledge base.
Drawing on the concept of linear relational embeddings, we propose two key hypotheses: (i) Because LMs utilize an abstract representation scheme, they can readily generate plausible objects for non-existent subjects within linear relationships, thereby inducing hallucinations. (ii) Conversely, for nonlinear relationships, this specific mechanism for object generation is unavailable, making it easier for the model to avoid hallucinating.
To validate this hypothesis, we developed SyntHal, a synthetic benchmark featuring unknown entities across 15 distinct relations. Our results demonstrate that relational linearity serves as a robust predictor of whether a model will hallucinate an object for an unknown subject rather than refusing to answer. This correlation was observed across four instruction-tuned models, with coefficients ranging from $r \in [.58, .84]$.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



