arXiv

AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

June 2, 2026 · Liu Qing, Ou Wu, Yi Du · Original Source

Title: AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

Abstract:

Effective post-training for Large Language Models hinges critically on token selection. Current approaches largely depend on local heuristics, often failing to treat token selection as a rigorous valuation of individual tokens within a response. To address this, we present AlphaToken, a framework designed to value response tokens by separating the process into two distinct components: adaptation, which drives learning for the target task, and stability, which safeguards pre-trained capabilities. To ensure these objectives are path-aware, AlphaToken integrates direct-path signals derived from local token gradients with downstream causal-path signals inherent in autoregressive generation.

Given that retention data is typically inaccessible, AlphaToken estimates stability using a Fisher-drift proxy anchored to the pre-trained reference model. To facilitate efficient computation at the token level, we adapt the Ghost Dot-Product technique. During fine-tuning and preference optimization, AlphaToken filters out low-value response tokens, thereby focusing training signals on positions that offer higher value. Our experimental results demonstrate that AlphaToken enhances post-training performance while effectively reducing catastrophic forgetting.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC