arXiv

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

June 3, 2026 · Leonid Berlyand, Theo Bourdais, Houman Owhad, Yitzchak Shmalo · Original Source

Title: Leveraging the Marchenko–Pastur Distribution for Deep Neural Network Pruning

Abstract: This study investigates the application of Marchenko–Pastur (MP) random-matrix theory to the pruning of deep neural networks, specifically targeting scenarios with minimal post-pruning fine-tuning resources. The primary practical advantage lies in maintaining accuracy through brief calibration and fine-tuning phases, avoiding the need for extensive reoptimization pipelines. Theoretical analysis provides deterministic guarantees for data paths: pruning reduces an elastic-net objective and retains samples where the dense margin is more than double the perturbation, provided the removed component $R$ has a small propagated logit effect defined as $L_s | R \psi_1(s) |\infty$. In the zero-budget scenario, pruning is perfect. Furthermore, a prune–restore extension facilitates weight restoration within a fixed sparse-execution framework, while an additive $L_2$-regularized model demonstrates that admissible random-like components disappear at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian conditions, the fitted MP edge $\sigma+$ serves as a high-probability layerwise budget indicator.

On the ImageNet-1k dataset, the ViT-B/16 model with $2{:}4{+}$ToMe compression achieves an $83.41\%$ top-1 accuracy after just three distillation epochs, representing a $-1.70$ percentage point drop from the dense baseline. This performance is attained with a $59.81\%$ reduction in sparse-execution MACs. When using the same checkpoint and ToMe graph, the A40 native-$2{:}4$ backend yields a $1.388\times$ speedup; a separate endpoint on an A100 without ToMe achieves a $2.705\times$ speedup.

Regarding structured sparsity, the ViT-B/16 $6{:}12$ configuration reaches $83.74\%$ accuracy. The ViT-L/16 model, utilizing dense+permutation strategies, attains $85.33\%$ accuracy ($-0.51$ pp from dense), while the ConvNeXtV2-Base $12{:}16$ model reaches $86.35\%$ ($-0.37$ pp). For convolutional neural networks, the ResNet50 $8{:}16$ dense+permutation variant achieves $75.87\%$ accuracy ($-0.26$ pp), and the ResNet152d CAST-conv+permutation model reaches $81.33\%$ accuracy ($-1.53$ pp). These CNN results are observed at approximately $50\%$ MAC accounting, accompanied by a $1.62\times$ speedup on the A40 im2col$+2{:}4$ sparse-GEMM audit.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC