Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
Title: Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
Abstract: While Chain-of-Thought prompting delivers high accuracy for advanced reasoning tasks, it is often hampered by excessive latency and high test-time inference expenses. Conversely, the conventional strategy of fine-tuning smaller models tends to compromise interpretability and adds considerable operational and resource burdens. To overcome these challenges, we propose Prompt-Level Distillation (PLD). This method involves extracting distinct reasoning patterns from a Teacher model and formatting them into a comprehensive list of expressive instructions, which are then integrated into the Student model’s System Prompt.
In evaluations utilizing Gemma-3 4B, PLD significantly boosted performance metrics: Macro F1 scores rose from 57% to 90.0% on StereoSet and from 67% to 83% on Contract-NLI, while LogiQA accuracy reached 70%. Comparable outcomes observed with Mistral Small 3.1 confirm the approach’s cross-architecture generalizability, allowing compact models to achieve frontier-level performance with minimal latency impact. By making the decision-making process fully transparent through these expressive instructions, PLD facilitates complete human verification of logic. This transparency makes the method particularly well-suited for highly regulated sectors like finance, law, and content moderation, as well as for high-volume applications and edge devices.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





