arXiv

Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG

June 2, 2026 · Ibne Farabi Shihab, Fariya Afrin, Sanjeda Akter, Anuj Sharma · Original Source

Title: Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG

Original: arXiv:2606.00432v1 Announce Type: new Abstract: As retrieval-augmented generation (RAG) systems scale, it becomes increasingly challenging to ensure faithful grounding in external evidence. Large language models may still prioritize parametric knowledge over retrieved information when conflicts arise. We propose a novel training-free decoding framework, \emph{Grounded Decoding}, designed to improve factual consistency in RAG without modifying model parameters. Unlike standard approaches that rely on a single conditional distribution, our method constructs two matched-prompt distributions at every generation step: (1) a full RAG distribution conditioned on the query, retrieved documents, and generated prefix, and (2) a retrieval-only distribution conditioned solely on retrieved evidence and the same prefix. The final next-token distribution is derived as the unique solution to a KL-barycenter objective over the probability simplex, yielding a normalized geometric fusion of the two distributions.This formulation naturally recovers standard RAG when the grounding weight is zero and smoothly shifts probability mass toward retrieved evidence as grounding strength increases. We further introduce a conflict-aware adaptive weighting scheme that dynamically adjusts grounding based on distributional disagreement and retriever confidence. Experiments on ALCE, Natural Questions, and FActScore demonstrate consistent improvements in factual accuracy and citation quality over standard RAG and competitive decoding-time baselines, while maintaining fluency. Our results indicate that probability-level fusion provides a strong and efficient alternative to logit-level intervention methods for faithful RAG decoding.

Rewritten:

Title: Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG

As Retrieval-Augmented Generation (RAG) architectures expand in complexity, maintaining reliable grounding in external sources presents growing difficulties. Even when discrepancies occur between external data and internal knowledge, large language models often default to relying on their parametric training data rather than the retrieved context. To address this, we introduce Grounded Decoding, a training-free decoding strategy aimed at enhancing the factual reliability of RAG systems without altering the underlying model weights.

In contrast to conventional methods that utilize a single conditional probability distribution, our approach generates two distinct, matched-prompt distributions at each step of the generation process. The first is a comprehensive RAG distribution, which factors in the initial query, the retrieved documents, and the text generated so far. The second is a retrieval-specific distribution that relies exclusively on the retrieved evidence alongside the same generated prefix. By solving a KL-barycenter objective across the probability simplex, we derive the final next-token distribution. This process results in a normalized geometric integration of the two distributions.

This mathematical structure ensures that when the grounding weight is set to zero, the system reverts to standard RAG behavior. As the grounding strength is increased, the probability mass is gradually redirected toward the retrieved evidence. Additionally, we present an adaptive weighting mechanism sensitive to conflicts, which dynamically tunes the grounding influence according to the level of disagreement between distributions and the confidence scores of the retriever.

Evaluations conducted on the ALCE, Natural Questions, and FActScore benchmarks reveal that our method consistently outperforms both standard RAG and competitive decoding-time baselines in terms of factual accuracy and citation precision, all while preserving text fluency. These findings suggest that fusing probabilities at the distribution level offers a robust and efficient substitute for logit-level intervention techniques when aiming for faithful RAG decoding.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC