arXiv

Enhancing LLM Metacognition via Cognitive Pairwise Training

June 2, 2026 · Weitao Li, Hao Zhou, Xuanyu Lei, Fandong Meng, Yuanhang Liu, Jingyi Ren, Ante Wang, Xiaolong Wang, Yuanchi Zhang, Fuwen Luo, Guangwen Yang, Lin Gan, Weizhi Ma, Yang Liu · Original Source

Title: Boosting LLM Metacognition Through Cognitive Pairwise Training

Original: arXiv:2606.00869v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become central to LLM reasoning, but its outcome-level rewards can make models more willing to give confident answers when evidence or reasoning is unreliable. Existing SFT or RL methods mainly teach LLMs to refuse or express uncertainty at the response level, which can overfit abstention behavior rather than improve reasoning reliability. To address this limitation, we propose Cognitive Pairwise Training (CPT), a cognitive mid-training alignment stage that turns pairwise comparisons over reasoning traces into a reusable alignment signal. By learning to distinguish trustworthy from flawed reasoning, CPT encourages the model to internalize a reasoning-quality discrimination boundary rather than memorize surface refusal patterns. Across five model scales and three model families, CPT improves the reasoning--metacognition trade-off. At 14B, CPT+RL outperforms the standard SFT+RL pipeline by +2.2 math-average points and +5.2 abstention-F1 points. Further analyses show that CPT improves trace quality and exhibits strong robustness and scalability across evaluation and training settings. Code and models are released at https://github.com/Tsinghua-dhy/CPT.

Rewrite:

Reinforcement learning with verifiable rewards (RLVR) is currently pivotal to enhancing reasoning capabilities in large language models (LLMs). However, relying on outcome-level rewards can inadvertently encourage models to deliver high-confidence responses even when the underlying evidence or logic is unsound. While traditional supervised fine-tuning (SFT) and reinforcement learning approaches primarily instruct LLMs to decline or indicate uncertainty at the final response stage, these techniques often lead to an overfitting of avoidance behaviors instead of genuinely boosting the reliability of the reasoning process.

To overcome these challenges, we introduce Cognitive Pairwise Training (CPT), an alignment phase during mid-training that leverages cognitive principles. This method transforms pairwise evaluations of reasoning traces into a consistent alignment signal. By mastering the ability to differentiate between sound and defective reasoning, CPT prompts models to develop an intrinsic boundary for assessing reasoning quality, rather than simply memorizing superficial cues for refusal.

Our evaluation across three distinct model families and five different model sizes demonstrates that CPT effectively enhances the balance between reasoning performance and metacognitive accuracy. Specifically, at the 14B parameter scale, the combination of CPT and RL surpasses the conventional SFT+RL approach, achieving gains of 2.2 points in average mathematics scores and 5.2 points in abstention F1 scores. Additional investigations reveal that CPT not only elevates the quality of reasoning traces but also maintains significant robustness and scalability across various training and assessment environments. The associated codebase and models are publicly available at https://github.com/Tsinghua-dhy/CPT.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC