arXiv

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Title: ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Abstract:

While Reinforcement Learning with Verifiable Rewards (RLVR) applied to Chain-of-Thoughts (CoTs) has driven significant advancements in Large Reasoning Models (LRMs), these models suffer from "over-thinking." This issue arises because long CoTs inherently involve trial and error, and standard RLVR methods reinforce the entire trajectory—including redundant explorations—when selecting outcome-correct paths for memorization. Although prior efforts have attempted to address this by favoring shorter trajectories, their reliance on outcome-based signals fails to eliminate the memorization of unnecessary steps within longer chains. To overcome this limitation, we introduce ThoughtFold, a framework utilizing fine-grained preference learning to curb redundant exploration and enhance reasoning efficiency. ThoughtFold utilizes an introspective mechanism to pinpoint redundancies within every correct trajectory, generating a diverse set of candidate sub-trajectories. Based on this spectrum, we propose a masked preference optimization objective that actively penalizes redundant actions and incentivizes the model to connect key reasoning steps directly. This process effectively "folds" the reasoning chain into a more streamlined path. Our extensive experiments demonstrate that ThoughtFold markedly improves efficiency; specifically, it cuts the token consumption of DeepSeek-R1-Distill-Qwen-7B by roughly 56% without compromising its state-of-the-art accuracy.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...