arXiv

Retrieval-Augmented Linguistic Calibration

Title: Retrieval-Augmented Linguistic Calibration

Original: arXiv:2605.19344v2 Announce Type: replace Abstract: Linguistic cues such as "I believe" and "probably" offer an intuitive interface for communicating confidence, yet a generalisable, principled calibration framework for linguistic confidence expressions remains underexplored. In particular, co-occurring linguistic cues, contextual variation, and subjective audience interpretation pose unique challenges. We therefore model linguistic confidence as a distribution over plausible perceived probability values that a statement is correct, capturing interpretation variability that scalar representations discard. Within this distributional framework, we introduce faithfulness as a complementary evaluation dimension and present Faithfulness Divergence (FD), an information-theoretic metric quantifying the surprise induced in audience beliefs upon truth revelation. Building on these foundations, we present Retrieval-Augmented Linguistic Calibration (RALC), a lightweight post-hoc pipeline that propagates calibrated confidence signals back into natural language via retrieval-augmented rewriting. Across three QA benchmarks and five LLM families, RALC improves in-domain faithfulness and calibration up to 66% and 58%, respectively, outperforming black-box and grey-box calibration baselines.

Rewrite:

While phrases like "I believe" and "probably" provide an accessible means of conveying certainty, there is currently a lack of robust, generalized frameworks for calibrating these linguistic expressions. This gap is largely due to distinct difficulties arising from subjective audience interpretation, shifting contexts, and the simultaneous use of multiple linguistic signals. To address this, we propose modeling linguistic confidence not as a single value, but as a distribution of plausible probabilities reflecting how a statement might be perceived as correct. This approach retains the nuances of interpretation that are often lost when relying on scalar metrics.

Within this distributional model, we define "faithfulness" as an additional metric for assessment. We also introduce Faithfulness Divergence (FD), an information-theoretic measure designed to quantify the degree of surprise experienced by an audience when the truth is revealed. Leveraging these concepts, we develop Retrieval-Augmented Linguistic Calibration (RALC), a streamlined post-processing method. RALC uses retrieval-augmented rewriting to reintegrate calibrated confidence levels into natural language outputs.

Evaluated across five different Large Language Model (LLM) families and three Question Answering (QA) benchmarks, RALC demonstrates significant gains. It enhances in-domain faithfulness by up to 66% and improves calibration by up to 58%, surpassing both black-box and grey-box calibration baselines.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...