arXiv

SeSE: Black-Box Uncertainty Quantification for Large Language Models Based on Structural Information Theory

June 3, 2026 · Xingtao Zhao, Hao Peng, Dingli Su, Xianghua Zeng, Chunyang Liu, Jinzhi Liao, Philip S. Yu · Original Source

Title: SeSE: Black-Box Uncertainty Quantification for Large Language Models Based on Structural Information Theory

Abstract: The deployment of large language models (LLMs) in safety-critical environments relies heavily on reliable uncertainty quantification (UQ). This capability allows models to withhold responses when they are uncertain, effectively preventing hallucinations—outputs that appear credible but are factually wrong. Although semantic UQ methods have demonstrated high performance, they tend to ignore the latent semantic structural information that could facilitate more accurate uncertainty estimates. To address this gap, we introduce \underline{Se}mantic \underline{S}tructural \underline{E}ntropy ({SeSE}), a principled black-box UQ framework designed for both open- and closed-source LLMs. SeSE uncovers the intrinsic structure of the semantic space by building an encoding tree that achieves minimal structural entropy, thereby creating an optimal hierarchical abstraction. The structural entropy of this tree serves as a measure of the inherent uncertainty within the LLM’s semantic space following optimal compression. Furthermore, while current methods mostly target simple, short-form generation, we extend SeSE to deliver interpretable, fine-grained uncertainty estimates for long-form outputs. We provide a theoretical proof that SeSE generalizes semantic entropy, which is considered the gold standard for LLM UQ, and we empirically validate its superior performance against strong baselines across 24 model-dataset combinations.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC