Global News Digest

arXiv

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

Title: CAPF: Directing Search-Agent Trajectories via Credit-Attenuated Privileged Feedback

Abstract:

Contemporary LLM-based search agents frequently employ reinforcement learning with verifiable rewards (RLVR) to acquire search-augmented reasoning capabilities driven by outcome-based rewards. However, when tackling complex tasks, these agents seldom generate successful end-to-end rollouts, resulting in outcome-only RLVR approaches suffering from a scarcity of positive-reward trajectories. We contend that enhancing learning on difficult problems necessitates supplementary guidance during the training phase. Fortunately, RLVR systems already possess verifier-side data that can serve this purpose; this information can pinpoint errors or omissions in the agent’s proposed answer, thereby directing the revision process within the rollout.

To leverage this, we introduce a training-time framework named Credit-Attenuated Privileged Feedback (CAPF). This mechanism exposes verifier-side insights through a Privileged Feedback invocation during the training stage. CAPF enables the policy to transform zero-reward attempts into successful repair trajectories with positive rewards. Furthermore, it adjusts the credit assignment for both the feedback call and preceding actions, ensuring seamless deployment in environments where such privileged feedback is unavailable. Our empirical studies show that CAPF elevates Qwen3-4B’s average exact-match score from 44.7%—achieved under standard outcome-only RLVR—to 48.5% across seven open-domain question-answering benchmarks.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.