Hint Tuning: Less Data Makes Better Reasoners
Title: Hint Tuning: Less Data Makes Better Reasoners
Abstract:
While large reasoning models attain high accuracy via extended chain-of-thought processes, they often produce 5 to 8 unnecessary tokens, employing verbose reasoning uniformly irrespective of task complexity. To address this, we introduce Hint Tuning, a method that requires minimal data to instruct models in calibrating their reasoning depth. Our central premise is that the corresponding instruction-following model acts as an optimal difficulty probe. By evaluating the instruct model’s performance under varying levels of guidance, we can automatically generate training data encompassing three distinct states: No-Hint (for direct answers), Sparse-Hint (utilizing minimal prefixes), and Full-Hint (providing complete reasoning). This strategy transforms the subjective problem of labeling difficulty into an objective consistency check between the reasoning and instruct models. Leveraging just 1,000 self-annotated samples, Hint Tuning reduces token usage by 24–66% (averaging 31.5%) across various scales (4B–32B) of mainstream reasoning models, including Qwen3-Thinking and DeepSeek-R1-Distill, without compromising accuracy on five major benchmarks. In contrast to approaches dependent on extensive distillation datasets or costly reinforcement learning, we secure superior efficiency by simply aligning with the capabilities of the instruct model. The associated code and data can be accessed at https://github.com/redai-infra/hint-tuning.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




