arXiv

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

Title: SCI-PRM: A Tool-Aware Process Reward Model for Scientific Reasoning Verification

Abstract:

Although Process Reward Models (PRMs) have shown significant promise in the realm of mathematical reasoning, their potential within intricate scientific fields—such as physics, chemistry, and biology—has yet to be fully realized. Addressing scientific challenges requires more than just logical precision; it also demands strict factual accuracy and the correct application of specialized domain tools. Unfortunately, existing models frequently struggle with these areas, often resulting in hallucinations and a lack of robust verification. To bridge this gap, we introduce SCIPRM70K, a comprehensive new dataset that utilizes Chain-of-Tool trajectories to explicitly alternate between reasoning steps and the execution of scientific tools. Leveraging this dataset, we developed Sci-PRM, an efficient reward model designed to offer detailed, step-by-step supervision during a single inference pass. This supervision focuses on the accuracy of tool selection, execution, and result interpretation. Our experimental results indicate that Sci-PRM substantially improves foundation models in two primary ways: first, it facilitates effective test-time scaling through Best-of-N selection; and second, when employed within Reinforcement Learning frameworks, it provides a dense reward signal. This dense signal effectively counteracts the prevalent issue of advantage disappearance, empowering models to surpass current performance limits.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...

What Are A.I. Agents Actually Doing?
New York Times

What Are A.I. Agents Actually Doing?

Arena research shows tech professionals are most likely to use AI agents at work, highlighting a strong industry trend i...

TechCrunch

Cash App launches a wand for tap-and-pay

Cash App launched a $25 NFC "Magic Wand" for tap-and-pay, blending viral novelty with practical contactless payments. It...

Databricks CEO Plans to Avoid IPO During Year of Huge Offerings
Bloomberg

Databricks CEO Plans to Avoid IPO During Year of Huge Offerings

Databricks CEO plans to avoid an IPO in 2021, despite a surge in public offerings. This contrasts with earlier reports t...

TechCrunch

Waymo’s spent robotaxi batteries will be used as grid storage

Waymo partners with B2U to repurpose retired robotaxi batteries for grid storage in California and Texas, aligning with ...