arXiv

SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

June 2, 2026 · Chenzhi Hu, Qinzhe Hu, Yuhang Xu, Junyi Chen, Ruijie Wang, Shengzhong Liu, Jianxin Li, Fan Wu, Guihai Chen · Original Source

Title: SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

Abstract:

High-performing Large Reasoning Models (LRMs), such as DeepSeek-R1 and OpenAI o1, attain superior accuracy on intricate tasks by employing extended chain-of-thought (CoT) reasoning trajectories. Nevertheless, the excessive verbosity inherent in these processes often leads to redundancy and overthinking. While prior research has utilized Group Relative Policy Optimization (GRPO) to shorten LRM outputs, their static length reward mechanisms fail to adapt dynamically to relative problem difficulty and the distribution of response lengths. This rigidity frequently causes over-compression and a subsequent decline in accuracy.

To overcome these limitations, we introduce SmartThinker, a novel, GRPO-based approach for efficient reasoning that incorporates progressive CoT length calibration. SmartThinker offers two primary contributions: First, it dynamically identifies the optimal length associated with peak accuracy during the training phase, steering overly long responses toward this target to minimize length without sacrificing precision. Second, it adjusts the length reward coefficient dynamically to prevent the unjustified penalization of valid reasoning paths. Our extensive experiments demonstrate that SmartThinker achieves an average length compression of up to 52.5% while simultaneously enhancing accuracy. Notably, it yields an accuracy improvement of up to 16.6% on rigorous benchmarks such as AIME25. The source code is available at https://github.com/SJTU-RTEAS/SmartThinker.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC