arXiv

ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference

June 2, 2026 · Sourav Das · Original Source

**Title: ProbScale: Leveraging Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference

Abstract: Small Language Models (SLMs) strike a crucial balance between functional capability and computational practicality. Guided by neural scaling laws, which suggest that these models develop rich internal representations proportional to their size, SLMs are optimized for training. Nevertheless, deploying them remains difficult when strict resource limitations are in place. Language model probing offers a mechanism for dissecting the linguistic knowledge embedded within a model’s internal structures. To address deployment challenges, we introduce ProbScale, a framework that integrates principles from scaling laws and probing to pinpoint parameter-efficient subnetworks within pre-trained SLMs. ProbScale capitalizes on the high-fidelity representations inherent in well-scaled SLMs, employing task-specific probes to mathematically assess the importance of each layer for specific downstream capabilities. This approach enables the selection of subnetworks that achieve an optimal balance between performance and parameter count. We define the subnetwork selection process as identifying a layer subset that maximizes aggregated, task-weighted probe performance while adhering to a specified parameter budget. Our experiments, conducted on representative SLMs including RoBERTa-Large and T5-Base, show that ProbScale discovers subnetworks capable of reducing parameters by 5 to 10 times. Crucially, these streamlined models retain high performance—achieving 95% to 98% of the original SLMs’ accuracy on targeted tasks—surpassing heuristic baseline methods.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC