arXiv

SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks

Title: SaluNet: Unlocking Full Plasticity in Normalization-Free Deep Architectures

Abstract: For years, normalization techniques like LayerNorm and BatchNorm have been viewed as indispensable components for ensuring stable training in deep neural networks. However, this study reveals that these layers are not strictly necessary, as they can be entirely substituted by a single learnable activation mechanism. We highlight a phenomenon termed "plasticity suppression," where the adaptability of learnable activation parameters deteriorates rapidly when standard normalization is applied. In response, we present SALU (Saturated Adaptive Linear Unit), defined by the formula:

[ \operatorname{SALU}(x;a,b) = \frac{a x}{\sqrt{1 + a b x^2}},\quad a>0,\; b>0 ]

This bounded, learnable activation function stabilizes signals intrinsically, eliminating the need for batch-dependent statistics or external affine parameters. Leveraging SALU, we introduce SaluNet, a framework built on the principle of total plasticity. Within this architecture, SALU takes the place of normalization layers, while SWALU and GALU serve as replacements for conventional activation functions.

Our experiments demonstrate significant performance gains. Using ResNet-18, the normalization-free SaluNet-C-18 model achieves 97.35% accuracy on CIFAR-10 and 83.25% on CIFAR-100. Notably, it maintains robust performance of 93.44% and 76.23% respectively at a batch size of 1, a scenario where traditional normalized architectures typically fail. In transformer models, SaluNet-T surpasses the LayerNorm-GELU baseline, improving accuracy from 90.92% to 91.01% on CIFAR-10 and from 66.54% to 68.10% on CIFAR-100. Furthermore, SaluNet-C-50 attains a Top-1 accuracy of 78.67% on ImageNet-1K at a resolution of $224\times224$, and $79.23\%$ at $288\times288$. These findings indicate that normalization layers hinder total plasticity—a capability naturally inherent in biological neurons—which is crucial for the effective learning of deep networks.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...