arXiv

Reasoning Shift: How Context Silently Shortens LLM Reasoning

Title: Reasoning Shift: How Context Silently Shortens LLM Reasoning

Abstract:

While large language models (LLMs) that exhibit test-time scaling behaviors—such as extended reasoning traces and self-verification—have achieved impressive results on complex, long-horizon reasoning tasks, the robustness of these capabilities has not been thoroughly examined. To address this gap, we perform a systematic evaluation of several reasoning models across three distinct scenarios: problems embedded within lengthy, irrelevant context; multi-turn conversations involving independent tasks; and problems framed as subtasks within larger, complex objectives.

Our analysis uncovers a notable trend: when the same problem is presented under varying contextual conditions, reasoning models generate significantly shorter reasoning traces—reductions of up to 65%—compared to when the problem is isolated. A more granular investigation indicates that this compression correlates with a decline in self-verification and uncertainty management practices, such as double-checking work. Although this shift in behavior does not impact performance on simpler tasks, it may hinder effectiveness on more difficult challenges. Furthermore, we demonstrate that targeted supervised fine-tuning can partially alleviate the negative impacts of irrelevant context. We aim for these insights to highlight the need for greater attention to the robustness of reasoning models and the critical issue of context management in LLMs and LLM-based agents.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...

What Are A.I. Agents Actually Doing?
New York Times

What Are A.I. Agents Actually Doing?

Arena research shows tech professionals are most likely to use AI agents at work, highlighting a strong industry trend i...

TechCrunch

Cash App launches a wand for tap-and-pay

Cash App launched a $25 NFC "Magic Wand" for tap-and-pay, blending viral novelty with practical contactless payments. It...

Databricks CEO Plans to Avoid IPO During Year of Huge Offerings
Bloomberg

Databricks CEO Plans to Avoid IPO During Year of Huge Offerings

Databricks CEO plans to avoid an IPO in 2021, despite a surge in public offerings. This contrasts with earlier reports t...

TechCrunch

Waymo’s spent robotaxi batteries will be used as grid storage

Waymo partners with B2U to repurpose retired robotaxi batteries for grid storage in California and Texas, aligning with ...