Global News Digest

arXiv

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

Title: RAFT: Mitigating Catastrophic Forgetting in Domain Fine-Tuning via Data Refinement and Adaptive Distillation

Abstract:

Supervised fine-tuning (SFT) tailored to specific domains typically enhances performance within that niche but often compromises the model’s broader, general-purpose capabilities. We analyze this decline by identifying two critical deficiencies inherent in domain-specific SFT. The first is the supervision-compatibility gap, which arises because domain-specific targets frequently exhibit distinct stylistic and reasoning patterns that diverge from the natural responses generated by the pre-trained model. The second is the trajectory-preservation gap, where teacher-forced SFT focuses solely on optimizing fixed target tokens, neglecting to constrain how the model behaves when generating its own prefixes. Consequently, the model fails to retain its original behavioral traits.

To resolve these issues, we introduce RAFT (Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting), a two-stage framework designed to address both gaps. In the initial stage, RAFT generates supervision that is compatible with the model by employing answer fusion, semantic filtering, and self-conditioned rewriting. In the second stage, it implements Answer-Conditioned On-Policy Distillation. During this process, the original instruction-tuned model acts as a teacher, providing soft targets for trajectories generated by the student model, while the fused answer serves as contextual conditioning. To further stabilize the balance between domain-specific and general capabilities, we incorporate top-K temperature distillation and adaptive loss balancing based on Exponential Moving Average (EMA).

Experimental evaluations across five domains using three instruction-tuned backbone models demonstrate RAFT’s efficacy. It achieves an average domain accuracy improvement of 23.2% compared to standard SFT. Furthermore, it partially restores general capabilities degraded by SFT, yielding relative improvements of 18.2% on MS-Bench and 10.2% on IFEval. These findings indicate that integrating data refinement with trajectory-level preservation offers a robust strategy for domain fine-tuning that minimizes catastrophic forgetting.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.