CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision
Title: CAREF: Achieving Faithful Explanations Through Calibration-Aware Regularization Without Rationale Supervision
Abstract: This paper presents CAREF, a parameter-efficient fine-tuning methodology designed to simultaneously enhance predictive performance and the faithfulness of explanations by employing calibration-aware regularization. Central to this approach is the Calibration-Aware Regularization for Explanation Faithfulness (LSCED), a unified loss function that integrates entropy-based calibration with token-level sparsity control. Notably, this framework operates without the need for rationale supervision. In evaluations across four Natural Language Explanation (NLE) benchmarks—COS-E, ECQA, ComVE, and e-SNLI—using the Flan-T5 model, our lightweight CAREF-AQ variant demonstrated superior results. It achieved the highest average accuracy of 89.04 and an explanation alignment score of 81.00 (nBERT), utilizing merely 6.43% of the trainable parameters. These results surpass those of both LoRA and AdaLoRA. To our knowledge, CAREF represents the pioneering technique to combine entropy and sparsity regularization within a single training objective for the fine-tuning of interpretable Large Language Models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





