Can Reasoning Path still be Effective as Input? Bridging Post-Reasoning to Chain-of-Thought Compression
Title: Can Reasoning Paths Remain Effective as Input? Connecting Post-Reasoning with Chain-of-Thought Compression
Abstract: The emergence of extended Chain-of-Thought (CoT) sequences has significantly boosted the reasoning capabilities of Large Language Models (LLMs), though this comes at the cost of increased inference time and reduced efficiency. Current approaches to mitigating this issue typically involve compressing the generated CoT; however, such methods often strip away critical information required to reach the correct solution. To address this, we introduce a novel reasoning framework called "post-reasoning," which integrates CoT directly into the context to streamline the reasoning process for LLMs. While our findings indicate that post-reasoning substantially shortens the output length of LLMs, its success depends heavily on the quality and efficiency of the contextual CoT provided. Consequently, we present Upfront CoT (UCoT), an efficient post-reasoning architecture designed for CoT compression. UCoT employs a lightweight compressor model to generate contextual CoT in the form of soft tokens, which are then utilized by the main LLM (the executor) to derive the final answer. Comprehensive experiments demonstrate that UCoT preserves the robust reasoning skills of the executor while markedly decreasing CoT length. Notably, when applied to the Qwen2.5-7B-Instruct model on the GSM8K dataset, UCoT cuts token usage by 50% and outperforms the current state-of-the-art (SOTA) method by 3.08%.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




