ThinkSwitch: Context Distillation with LoRA and Weight Interpolation for Specific-Purpose Reasoning Tasks
Title: ThinkSwitch: Leveraging LoRA and Weight Interpolation for Context Distillation in Specialized Reasoning
Abstract:
While large language models frequently enhance their performance on complex problems by allocating additional inference-time compute to generate a reasoning trace prior to delivering a final answer, this approach introduces significant drawbacks, including increased latency, higher token expenses, and greater deployment complexity. To address these challenges, we propose ThinkSwitch, a resource-efficient co-training method for paired instruct and thinking checkpoints.
Beginning with compatible Qwen3-4B models for both instruct and thinking purposes, the procedure operates iteratively: the thinking checkpoint generates answers, after which the reasoning traces are stripped away. The resulting answer-only pairs are then distilled into the instruct checkpoint using QLoRA. Concurrently, a new thinking checkpoint is reconstructed via spherical weight interpolation. This process requires no manual labeling; the sole human input consists of task prompts, while the model autonomously generates the corresponding labels.
Experimental results on a 30-question subset of AIME 2026 demonstrate that ThinkSwitch raises the instruct checkpoint’s score from 10/30 to 20/30, and the thinking checkpoint’s from 14/30 to 22/30. Similarly, on a 30-question segment of PubMedQA, the instruct checkpoint improved from 13/30 to 18/30, while the thinking checkpoint rose from 18/30 to 25/30. The entire experiment was conducted using 15 training prompts per domain at a total cost of $2.86 on a single cloud-based RTX 3070. Although these findings stem from a small-scale study, they suggest that targeted distillation cycles can effectively embed the advantages of explicit reasoning into model weights, all while maintaining a distinct thinking mode.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




