arXiv

Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising

June 2, 2026 · Youssef Saied, Fran\c{c}ois Fleuret · Original Source

**Title: Enabling Normalization Equivariance for Any Backbone Architecture: An Approach Applied to Image Denoising

Abstract: Normalization Equivariance (NE) serves as a structural prior designed to enhance resilience against distribution shifts in image-to-image applications. A function $f$ is defined as normalization equivariant if it satisfies the condition $f(a y + b\mathbf{1}) = a f(y) + b\mathbf{1}$ for all scalars $a>0$ and $b\in\mathbb{R}$. Previous approaches to NE required restricting every internal layer to operations compatible with NE, a constraint that increases computational runtime and prevents the use of common transformer elements like LayerNorm and softmax attention. To address these limitations, we propose Wrapped Normalization Equivariance (WNE), a parameter-free mechanism that normalizes the input, processes it through any chosen backbone network, and subsequently denormalizes the result. We demonstrate that every NE function can be represented by this factorization, meaning the wrapper precisely characterizes the entire class of NE functions. In experiments involving blind denoising, applying this wrapping technique to both CNN and transformer models significantly boosts robustness to noise-level mismatches without introducing any detectable GPU overhead. In contrast, baseline architectures with built-in NE constraints were found to be up to $1.6\times$ slower.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC