arXiv

Learning Fine-grained Parameter Sharing via Sparse Tensor Decomposition

June 2, 2026 · Cem \"Uy\"uk, Mike Lasby, Mohamed Yassin, Utku Evci, Yani Ioannou · Original Source

Title: Achieving Fine-Grained Parameter Sharing Through Sparse Tensor Decomposition

Abstract:

While large neural networks deliver state-of-the-art results across numerous applications, their massive scale creates significant barriers to deployment on devices with limited resources. Although various compression techniques exist, cross-layer parameter sharing has seen limited exploration within transformer architectures. To address this gap, we propose Fine-grained Parameter Sharing (FiPS), a comprehensive framework designed to compress Multi-Layer Perceptrons (MLPs) in transformers. FiPS integrates low-rank factorization, sparsity, and cross-block parameter sharing into a single, unified optimization process.

The method works by concatenating MLP weight matrices from a selected group of transformer blocks and decomposing them into two components: a shared basis and sparse, layer-specific projection matrices. Both components are initialized using Singular Value Decomposition (SVD) and are jointly refined through the minimization of block-wise reconstruction error.

Our experiments demonstrate that FiPS can reduce the size of Vision Transformers (ViTs) by as much as 33% with a top-1 accuracy drop of less than 1% on ImageNet-1k. When fine-tuning is applied, compression ratios reach up to 57%. For Large Language Models (LLMs), FiPS achieves compression rates of up to 20%, surpassing existing SVD-based techniques in both perplexity and downstream benchmark performance at equivalent compression levels. Furthermore, when paired with Quantization-Aware Training (QAT), a 3-bit FiPS implementation on the Gemma-2-2B model yields lower perplexity than 2-bit QAT alone, while maintaining an identical 8x compression ratio. These findings confirm that fine-grained parameter sharing is a viable and efficient strategy for compressing transformer MLPs.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC