arXiv

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

June 2, 2026 · Wei Tian, Yuhao Zhou, Man Lan · Original Source

Title: CSRP: Leveraging Chain-of-Thought Reasoning and Efficiency-Aware Reinforcement Learning for Accurate Chinese Text Correction

Abstract

Chinese Grammatical Error Correction (CGEC) systems powered by Large Language Models (LLMs) encounter two significant hurdles. First, generic models often lack the specialized linguistic priors necessary to navigate subtle grammatical nuances. Second, Supervised Fine-Tuning (SFT) utilizing Maximum Likelihood Estimation (MLE) does not optimize for precision-oriented metrics, which frequently results in systematic over-correction. To address these issues, we introduce CSRP, a three-phase framework designed to incrementally enhance correction capabilities. This approach begins with Continual Pre-training (CPT) on a balanced dataset of 5.9 million samples to embed domain-specific knowledge. It is followed by Chain-of-Thought SFT, which employs explicit error reasoning to ensure diagnostic transparency, and concludes with Group Relative Policy Optimization. This final stage incorporates a novel Efficiency-Aware Reward mechanism that explicitly penalizes superfluous edits.

Evaluations on the NACGEC benchmark reveal that CSRP attains state-of-the-art performance, recording an $F_{0.5}$ score of 50.99 and a precision of 57.17. These results significantly surpass prior bests and effectively counteract the over-correction bias commonly found in MLE-trained models. Furthermore, our method elevates Chinese Spelling Correction (CSCD) performance to an F1 score of 59.61, outperforming GPT-4 by 5.20 points. Extensive ablation studies confirm that the RL alignment phase yields an 8% relative improvement over the SFT baseline. Crucially, this gain is orthogonal to the benefits provided by large-scale CPT, underscoring that explicit optimization for edit efficiency is a critical component for achieving high-quality grammatical error correction. The source code is publicly accessible at https://github.com/TW-NLP/ChineseErrorCorrector.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC