arXiv

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

June 3, 2026 · Yu Xia, Zhouhang Xie, Xin Xu, Byungkyu Kang, Prarit Lamba, Xiang Gao, Julian McAuley · Original Source

Title: Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Abstract:

While extended chain-of-thought reasoning significantly boosts the final-answer accuracy of large language models, it frequently results in inefficient token consumption and lacks precise control during inference. Current approaches to reasoning efficiency typically manage thinking length by truncating, early-stopping, or compressing reasoning traces, which leaves the model’s internal thought process largely opaque. To address these limitations, we introduce Agentic Chain-of-Thought Steering (ACTS). This method frames reasoning steering as a Markov decision process, employing a controller agent that adaptively guides a frozen reasoner during the inference phase. At every step, the controller monitors both the ongoing reasoning trace and the remaining thinking budget, subsequently issuing a steering action composed of a specific reasoning strategy and a steering phrase designed to prompt the next step from the reasoner. This mechanism allows for budget-aware strategy management, ensuring efficient reasoning while maintaining the continuity of the reasoner’s generation. We initialize the controller agent using synthetic steering trajectories constructed with multi-budget augmentation and further refine it through reinforcement learning, utilizing budget-conditioned reward shaping. Our experiments across various benchmarks demonstrate that ACTS achieves performance comparable to full-thinking methods while delivering significant token savings. Furthermore, it facilitates controllable trade-offs between accuracy and efficiency across diverse reasoners and tasks. The code is available at https://github.com/Andree-9/ACTS.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC