arXiv

Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation

Title: Promoting Faithful Reasoning in Retrieval-Augmented Generation: A Step Beyond Accuracy

Abstract:

Following the triumph of reinforcement learning (RL) in training Large Language Models (LLMs) for specialized fields such as mathematics and software engineering, researchers are increasingly focusing on equipping LLMs to dynamically plan, query, and reason using search engines as external tools. This emerging approach is widely known as agentic search. While these methods have demonstrated performance gains on standard short-form question-answering benchmarks, they often focus heavily on the accuracy of the final answer, neglecting the integrity of intermediate reasoning steps. This oversight can result in "chain-of-thought unfaithfulness," where the reasoning path does not logically support the conclusion.

In this study, we propose a comprehensive evaluation framework for agentic search that assesses faithfulness across three specific dimensions: Think-Search faithfulness, Information-Think faithfulness, and Think-Answer faithfulness. Our analysis indicates that standard agentic search systems, such as Search-R1 and ReSearch, which are trained via Reinforcement Learning from Verifiable Reward (RLVR) using episode-level outcome-based rewards, exhibit substantial deficiencies in these faithfulness metrics.

To address this issue and encourage more faithful reasoning, we present VERITAS (Verifying Entailed Reasoning through Intermediate Traceability in Agentic Search). This novel framework incorporates fine-grained, turn-level faithfulness rewards directly into the reinforcement learning training process. Experimental results demonstrate that models trained using \ours not only achieve markedly higher reasoning faithfulness but also outperform baseline models that rely solely on episode-level outcome-based rewards in terms of overall task performance.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

Reuters

Amazon unveils new AI warehouse robot in $12 billion Europe push

Amazon unveiled a new AI warehouse robot, marking a key step in its $12 billion European expansion strategy to enhance l...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...