arXiv

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

Title: Failed Reasoning Traces Reveal What Is Repairable (Yet Not Through Direct Inspection)

Abstract

When post-trained language models stumble on reasoning tasks, the standard approach for test-time scaling is to allocate additional computational resources to generate more attempts, effectively ignoring the failed trajectories. We contend that this practice squanders a vital signal. Specifically, some failures stem from random sampling variance, which can be mitigated by generating more samples, whereas others are structural and remain unresolvable regardless of increased budget. We propose that failed traces contain a "recoverability structure," serving as an inference-time indicator of which specific interventions can salvage a particular failure. By analyzing the distributional signatures of these failed rollouts rather than their textual content, we derive three trajectory-level features based on the available intervention structure. These features allow us to map the failure landscape, clustering failures into distinct, stable regimes. This method achieves $84.3{\pm}4.3\%$ accuracy, outperforming a majority-class baseline by $20\%$. Furthermore, it enables a training-free routing mechanism that improves rescue rates by $12.2\%$ on the Steerable-Hard subset—a critical deployment-relevant category where simple retries fail but bounded interventions are accessible. The robustness of these features and the routing rule is confirmed through two cross-family probes. Ultimately, these three features transform discarded failed traces into diagnostic tools, facilitating test-time routing and post-training analysis without requiring access to weights or training-time data.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...