GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning
Title: GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning
Abstract
Transferring multi-step reasoning capabilities from large language models (LLMs) to smaller, more efficient student models is a persistent challenge, largely driven by issues such as noisy rationales, hallucinated supervision, and rigid teacher-student dynamics. Current distillation techniques, including those utilizing mentor models, typically function in an open-loop fashion. This approach implicitly presumes that the teacher is uniformly reliable, which often leads to the propagation of flawed intermediate reasoning steps.
To address these limitations, we introduce GateKD, a novel framework for confidence-gated closed-loop distillation. By positioning the teacher as a dynamic gatekeeper rather than a static oracle, GateKD facilitates more robust reasoning transfer. The framework integrates three synergistic mechanisms: (i) confidence-gated soft supervision, which filters and distills only reliable predictive signals; (ii) gated hidden-state evolution, ensuring that intermediate representations are aligned exclusively when the teacher’s confidence is high; and (iii) reliability-filtered attention distillation, which maintains stable reasoning structures while actively suppressing noisy patterns. Together, these elements create a closed feedback loop where the teacher’s confidence continuously modulates the distillation process, thereby minimizing the transfer of hallucinations and stabilizing the student’s reasoning capabilities.
We conducted extensive experiments across benchmarks for commonsense, logical, and symbolic reasoning, employing T5 and Flan-T5 backbones of various scales. The results demonstrate that GateKD consistently surpasses strong open-loop distillation baselines. Specifically, GateKD achieves significant improvements in logical and symbolic tasks, maintains robustness in low-resource distillation scenarios, and exhibits marked performance declines when any of its gating components are ablated. These findings underscore the importance of confidence-gated closed-loop supervision in developing reliable and scalable small-scale reasoning models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





