arXiv

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Title: Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Abstract:

Current strategies for improving model reasoning often rely on post-training techniques. While effective, these methods typically demand expensive training infrastructure and frequently result in outputs that are both inefficient and excessively long. To address these limitations, we present Speculative Thinking, a novel training-free framework. Unlike speculative decoding, which functions at the token level, our approach operates at the reasoning level, allowing large reasoning models to steer smaller counterparts during inference.

This methodology is grounded in two key observations. First, tokens that support reasoning, such as "wait," tend to follow structural markers like "\n\n," acting as indicators for reflection or continuation. Second, larger models demonstrate superior control over reflective processes, which minimizes unnecessary backtracking and enhances the overall quality of reasoning. By strategically offloading reflective steps to a more capable model, we significantly improve the reasoning accuracy of smaller models while simultaneously reducing their output volume.

Empirical results highlight the framework's effectiveness. When guided by a 32B reasoning model, a 1.5B model’s accuracy on the MATH500 benchmark rose from 83.2% to 89.4%, a gain of 6.2%. Concurrently, the average output length decreased by 15.7%, dropping from 5,439 tokens to 4,583 tokens. Furthermore, the framework proved beneficial for non-reasoning models as well; applying it to Qwen-2.5-7B-Instruct increased its accuracy on the same benchmark from 74.0% to 81.8%, reflecting a relative improvement of 7.8%.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Exelon CEO Sees Daily Cybersecurity Threats
Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower
Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...