arXiv

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

June 3, 2026 · Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang · Original Source

Title: Trimming the Branches: Early Path Pruning for Efficient Parallel Reasoning

Parallel reasoning significantly boosts the capabilities of Large Reasoning Models (LRMs), yet it often suffers from excessive costs driven by futile computational paths stemming from early mistakes. While pruning paths at the prefix level is a necessary mitigation strategy, current research lacks a unified framework, leaving the field fragmented. To address this gap, we introduce the first comprehensive taxonomy of path pruning techniques, classifying them according to their signal origin (internal versus external) and their capacity for learning (learnable versus non-learnable).

This classification highlights the untapped potential of learnable internal approaches, which inspired the development of STOP (Super TOken for Pruning). Our extensive testing across LRMs with parameter counts ranging from 1.5B to 20B confirms that STOP outperforms existing baselines in both effectiveness and efficiency. We also rigorously demonstrate STOP’s scalability under different compute constraints; for example, when applied to GPT-OSS-20B, it increased accuracy on the AIME25 benchmark from 84% to nearly 90% while maintaining fixed compute budgets. Finally, we synthesize our results into formalized empirical guidelines to support optimal deployment in practical scenarios. The associated code, data, and models are accessible at https://bijiaxihh.github.io/STOP.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC