Global News Digest

arXiv

Verifying Meta-Awareness via Predictive Rewards in Reasoning Models

Validating Meta-Awareness Through Predictive Rewards in Reasoning Architectures

Abstract:

Current investigations into reasoning models are increasingly focused on their meta-awareness, encompassing capabilities such as identifying the optimal duration for cognitive processing, delineating limits of knowledge, and organizing thought processes at a conceptual level. While existing large reasoning models rely exclusively on verification based on final answers, our findings demonstrate that incorporating meta-awareness objectives results in substantial performance enhancements compared to models lacking this meta-knowledge.

The proposed MAPR (Meta-Awareness via Predictive Reward) framework introduces a self-generated task wherein the model predicts rollout statistics—namely length, pass-rate, and concepts utilized—and verifies these predictions against actual outcomes. By exploiting this self-predictive ability, the model can modulate its reasoning behavior through three primary mechanisms: i) discarding prompts that are trivial or unsolvable, ii) curbing lengthy generations that are prone to errors, and iii) producing hints pertinent to the specific problem.

The outcomes are highly encouraging, with MAPR delivering marked improvements in both accuracy and training efficiency across multiple reasoning benchmarks. Specifically, the method accelerates GRPO training by more than 1.28x to achieve equivalent performance levels. Furthermore, it secures an 83.18% accuracy gain on the AIME25 benchmark and an average improvement of 13.04% across six distinct mathematics benchmarks. The codebase is open-source and accessible at https://github.com/akatigre/MAPR-RL.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers ā€œas much as possible,ā€ emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.