Global News Digest

arXiv

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

Title: CSRP: Leveraging Chain-of-Thought Reasoning and Efficiency-Aware Reinforcement Learning for Accurate Chinese Text Correction

Abstract

Chinese Grammatical Error Correction (CGEC) systems powered by Large Language Models (LLMs) encounter two significant hurdles. First, generic models often lack the specialized linguistic priors necessary to navigate subtle grammatical nuances. Second, Supervised Fine-Tuning (SFT) utilizing Maximum Likelihood Estimation (MLE) does not optimize for precision-oriented metrics, which frequently results in systematic over-correction. To address these issues, we introduce CSRP, a three-phase framework designed to incrementally enhance correction capabilities. This approach begins with Continual Pre-training (CPT) on a balanced dataset of 5.9 million samples to embed domain-specific knowledge. It is followed by Chain-of-Thought SFT, which employs explicit error reasoning to ensure diagnostic transparency, and concludes with Group Relative Policy Optimization. This final stage incorporates a novel Efficiency-Aware Reward mechanism that explicitly penalizes superfluous edits.

Evaluations on the NACGEC benchmark reveal that CSRP attains state-of-the-art performance, recording an $F_{0.5}$ score of 50.99 and a precision of 57.17. These results significantly surpass prior bests and effectively counteract the over-correction bias commonly found in MLE-trained models. Furthermore, our method elevates Chinese Spelling Correction (CSCD) performance to an F1 score of 59.61, outperforming GPT-4 by 5.20 points. Extensive ablation studies confirm that the RL alignment phase yields an 8% relative improvement over the SFT baseline. Crucially, this gain is orthogonal to the benefits provided by large-scale CPT, underscoring that explicit optimization for edit efficiency is a critical component for achieving high-quality grammatical error correction. The source code is publicly accessible at https://github.com/TW-NLP/ChineseErrorCorrector.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.