arXiv

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

June 4, 2026 · Ziheng Li, Liu Kang, Feng Xiao, Luxi Xing, Qingyi Si, Zhuoran Li, Weikang Gong, Deqing Yang, Yanghua Xiao, Hongcheng Guo · Original Source

Title: Refining Credit Assignment in Mathematical Reasoning Through Outcome-Grounded Advantage Reshaping

Abstract:

Group Relative Policy Optimization (GRPO) has recently gained traction as a promising reinforcement learning framework for reasoning tasks that eliminates the need for a critic. Nevertheless, traditional GRPO relies on a coarse-grained credit assignment approach, distributing group-level rewards evenly across all tokens in a sequence. This method overlooks the distinct impact of individual reasoning steps. To overcome this drawback, we propose Outcome-grounded Advantage Reshaping (OAR), a mechanism designed for fine-grained credit assignment that reallocates advantages according to the extent to which each token affects the model’s ultimate output.

We implement OAR through two distinct yet complementary strategies: (1) OAR-P, which leverages counterfactual token perturbations to estimate outcome sensitivity, providing a high-accuracy attribution signal; and (2) OAR-G, which employs an input-gradient sensitivity proxy to approximate the influence signal using just one backward pass. These importance metrics are combined with a conservative Bi-Level advantage reshaping framework that amplifies critical tokens while diminishing those with low impact, all while maintaining the total advantage mass. Extensive experiments on various mathematical reasoning benchmarks reveal that although OAR-P establishes the performance ceiling, OAR-G delivers similar improvements with minimal computational cost. Both variants significantly surpass a robust GRPO baseline, thereby advancing the limits of critic-free Large Language Model (LLM) reasoning.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Top international news

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

Related Articles

Meta’s Oversight Board says account bans lack due process, transparency

Meta rolls out a new AI creator assistant on Facebook

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

What Are A.I. Agents Actually Doing?