arXiv

Inducing Reasoning Primitives from Agent Traces

Title: Deriving Reasoning Primitives from Agent Traces

Original: arXiv:2606.02994v1 Announce Type: new Abstract: ReAct-style LLM agents often rediscover the same reasoning routines across problems, yet leave those routines trapped in transient scratchpads. We introduce Reasoning Primitive Induction, a single-pass method that mines successful ReAct traces, clusters recurrent reasoning moves, and converts the most frequent moves into a compact library of typed pseudo-tools. Each pseudo-tool is specified by a natural-language docstring interpreted by an LLM at invocation time, and a standard ReAct loop composes these primitives at test time. The central result is that induced libraries outperform the very agent that generated their traces: by +44pp on RuleArena NBA (30 -> 74), +30pp on MuSR team allocation (38 -> 68), and +22pp on NatPlan meeting planning (7 -> 29). Across five comparable subtasks spanning narrative deduction, rule application, and constraint-satisfaction planning, a single fixed configuration improves over zero-shot Chain-of-Thought on every subtask, matches or surpasses expert-authored decompositions, and outperforms AWM at lower average inference cost.

Rewrite: arXiv:2606.02994v1 Announcement Type: New Abstract: Although ReAct-based LLM agents frequently identify identical reasoning patterns across various challenges, these insights are typically confined to temporary scratchpads and not retained. To address this, we present Reasoning Primitive Induction, a one-pass approach that extracts successful ReAct execution traces, groups recurring reasoning steps, and transforms the most common patterns into a concise repository of typed pseudo-tools. At inference time, an LLM interprets the natural-language docstring defining each pseudo-tool, while a conventional ReAct loop integrates these primitives. Our primary finding demonstrates that these derived libraries surpass the performance of the original agents that produced the traces, yielding gains of +44 percentage points on RuleArena NBA (from 30 to 74), +30 percentage points on MuSR team allocation (from 38 to 68), and +22 percentage points on NatPlan meeting planning (from 7 to 29). In evaluations covering five related subtasks—including narrative deduction, rule application, and constraint-satisfaction planning—a unified configuration consistently beats zero-shot Chain-of-Thought, equals or exceeds manually crafted expert decompositions, and achieves superior results to AWM with reduced average inference expenses.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...