Learning Fine-grained Parameter Sharing via Sparse Tensor Decomposition
Title: Achieving Fine-Grained Parameter Sharing Through Sparse Tensor Decomposition
Abstract:
While large neural networks deliver state-of-the-art results across numerous applications, their massive scale creates significant barriers to deployment on devices with limited resources. Although various compression techniques exist, cross-layer parameter sharing has seen limited exploration within transformer architectures. To address this gap, we propose Fine-grained Parameter Sharing (FiPS), a comprehensive framework designed to compress Multi-Layer Perceptrons (MLPs) in transformers. FiPS integrates low-rank factorization, sparsity, and cross-block parameter sharing into a single, unified optimization process.
The method works by concatenating MLP weight matrices from a selected group of transformer blocks and decomposing them into two components: a shared basis and sparse, layer-specific projection matrices. Both components are initialized using Singular Value Decomposition (SVD) and are jointly refined through the minimization of block-wise reconstruction error.
Our experiments demonstrate that FiPS can reduce the size of Vision Transformers (ViTs) by as much as 33% with a top-1 accuracy drop of less than 1% on ImageNet-1k. When fine-tuning is applied, compression ratios reach up to 57%. For Large Language Models (LLMs), FiPS achieves compression rates of up to 20%, surpassing existing SVD-based techniques in both perplexity and downstream benchmark performance at equivalent compression levels. Furthermore, when paired with Quantization-Aware Training (QAT), a 3-bit FiPS implementation on the Gemma-2-2B model yields lower perplexity than 2-bit QAT alone, while maintaining an identical 8x compression ratio. These findings confirm that fine-grained parameter sharing is a viable and efficient strategy for compressing transformer MLPs.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





