ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference
**Title: ProbScale: Leveraging Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference
Abstract: Small Language Models (SLMs) strike a crucial balance between functional capability and computational practicality. Guided by neural scaling laws, which suggest that these models develop rich internal representations proportional to their size, SLMs are optimized for training. Nevertheless, deploying them remains difficult when strict resource limitations are in place. Language model probing offers a mechanism for dissecting the linguistic knowledge embedded within a modelās internal structures. To address deployment challenges, we introduce ProbScale, a framework that integrates principles from scaling laws and probing to pinpoint parameter-efficient subnetworks within pre-trained SLMs. ProbScale capitalizes on the high-fidelity representations inherent in well-scaled SLMs, employing task-specific probes to mathematically assess the importance of each layer for specific downstream capabilities. This approach enables the selection of subnetworks that achieve an optimal balance between performance and parameter count. We define the subnetwork selection process as identifying a layer subset that maximizes aggregated, task-weighted probe performance while adhering to a specified parameter budget. Our experiments, conducted on representative SLMs including RoBERTa-Large and T5-Base, show that ProbScale discovers subnetworks capable of reducing parameters by 5 to 10 times. Crucially, these streamlined models retain high performanceāachieving 95% to 98% of the original SLMsā accuracy on targeted tasksāsurpassing heuristic baseline methods.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




