Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks
Title: Achieving Exponential Compression in Deep Neural Networks via Automatically Differentiable Nonlinear Tensor Networks (ADNTNs)
Abstract
This study investigates Automatically Differentiable Nonlinear Tensor Networks (ADNTNs), a class of structured weight generators. In this framework, compact core tensors are optimized end-to-end using reverse-mode automatic differentiation (AD). Conceptually, ADNTNs extend the principles of tensor factorization and low-rank adaptation. Rather than relying on a single low-rank matrix update, an ADNTN constructs substantial weight tensors by leveraging a hierarchy of small core tensors, nonlinear activation functions, and optional lateral mixing tensors.
The research highlights three specific architectures: Tree Tensor Networks (TTNs), augmented TTNs (aTTNs) incorporating boundary disentanglers, and Multi-scale Entanglement Renormalisation Ansatze (MERA). The proposed formulation is versatile, supporting nonlinear activations, task-specific objectives, batching, and execution schedules tailored to hardware constraints. However, the paper maintains a crucial distinction: differentiating a contraction program is not equivalent to eliminating the computational cost of contractions. Automatic differentiation does not circumvent expenses associated with large intermediate values, suboptimal contraction orders, or the exact contraction of general loopy tensor networks.
Extensive simulations conducted on layers from AlexNet and VGG-16 demonstrate significant efficiency gains. In the tested configurations, per-layer compression ratios ranged from approximately $2000\times$ to $77000\times$. In many instances, model accuracy matched that of dense baselines, and in several VGG-16 scenarios, it even surpassed them. While these findings are preliminary rather than definitive, they indicate that ADNTNs offer a promising, mathematically rigorous, and hardware-conscious pathway to significantly smaller neural networks, contingent upon the co-design of optimization strategies, contraction schedules, and deployment kernels.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




