SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks
Title: SaluNet: Unlocking Full Plasticity in Normalization-Free Deep Architectures
Abstract: For years, normalization techniques like LayerNorm and BatchNorm have been viewed as indispensable components for ensuring stable training in deep neural networks. However, this study reveals that these layers are not strictly necessary, as they can be entirely substituted by a single learnable activation mechanism. We highlight a phenomenon termed "plasticity suppression," where the adaptability of learnable activation parameters deteriorates rapidly when standard normalization is applied. In response, we present SALU (Saturated Adaptive Linear Unit), defined by the formula:
[ \operatorname{SALU}(x;a,b) = \frac{a x}{\sqrt{1 + a b x^2}},\quad a>0,\; b>0 ]
This bounded, learnable activation function stabilizes signals intrinsically, eliminating the need for batch-dependent statistics or external affine parameters. Leveraging SALU, we introduce SaluNet, a framework built on the principle of total plasticity. Within this architecture, SALU takes the place of normalization layers, while SWALU and GALU serve as replacements for conventional activation functions.
Our experiments demonstrate significant performance gains. Using ResNet-18, the normalization-free SaluNet-C-18 model achieves 97.35% accuracy on CIFAR-10 and 83.25% on CIFAR-100. Notably, it maintains robust performance of 93.44% and 76.23% respectively at a batch size of 1, a scenario where traditional normalized architectures typically fail. In transformer models, SaluNet-T surpasses the LayerNorm-GELU baseline, improving accuracy from 90.92% to 91.01% on CIFAR-10 and from 66.54% to 68.10% on CIFAR-100. Furthermore, SaluNet-C-50 attains a Top-1 accuracy of 78.67% on ImageNet-1K at a resolution of $224\times224$, and $79.23\%$ at $288\times288$. These findings indicate that normalization layers hinder total plasticity—a capability naturally inherent in biological neurons—which is crucial for the effective learning of deep networks.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





