Capability Self-Assessment: Teaching LLMs to Know Their Limits
Title: Capability Self-Assessment: Teaching LLMs to Know Their Limits
arXiv:2606.00251v1 Announce Type: new
Abstract:
Fundamental to the reliability of intelligent systems is the capacity to identify personal limitations and determine whether to tackle a problem directly or delegate it. However, our findings indicate that contemporary large language models systematically fail to possess this skill: across various model families and scales, these models tend to overestimate their own competence, attempting to resolve queries beyond their capabilities. We define this specific ability as Capability Self-Assessment (CSA) and frame it as a policy-learning challenge, with the goal of enhancing self-assessment without compromising the model’s existing functions. Our research demonstrates that reinforcement learning is highly effective in teaching CSA, significantly surpassing supervised fine-tuning while maintaining the model’s original capabilities. Conversely, supervised fine-tuning severely diminishes the very capabilities the model is intended to evaluate. Furthermore, the self-assessment behavior acquired through learning shows strong generalization out of distribution, indicating that CSA is a transferable trait among models. Finally, CSA offers practical utility: it enhances local-cloud decision-making processes at inference time and serves as a valuable signal for selecting targeted data during training.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




