Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression
Title: Reassessing Tensor Decompositions for Post-Training Compression of Large Language Models
Abstract: Deploying large language models (LLMs) within strict resource limitations necessitates effective post-training compression strategies. Tensor decompositions have gained traction as a viable approach, providing efficient parameterizations that align well with the structural characteristics of Transformer weights. Nevertheless, prior research has largely confined its evaluations to limited scenarios, failing to clarify whether tensorization remains effective when scaled for broad deployment. This study conducts a comprehensive assessment of tensor compression techniques across both dense and Mixture-of-Experts (MoE) architectures, defining performance trade-offs through a combination of empirical and theoretical analyses. Our findings reveal a critical discrepancy: the shared subspaces inherent in tensor decompositions do not align with the heterogeneous representations acquired by contemporary LLMs. This insight delineates the practical boundaries of these methods and clarifies their appropriate role in large-scale applications. The source code is accessible at https://github.com/brain-lab-research/TT-LLM.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



