arXiv

Multigrade Neural Network Approximation

June 2, 2026 · Shijun Zhang, Zuowei Shen, Yuesheng Xu · Original Source

Title: Multigrade Neural Network Approximation

Abstract: This paper investigates Multigrade Deep Learning (MGDL) as a rigorous framework for structured error refinement within deep neural networks. Although the approximation capabilities of neural networks are well-established, training extremely deep models remains difficult due to optimization landscapes that are often ill-conditioned and highly nonconvex. Conversely, training shallower networks—particularly specific one-hidden-layer ReLU models—can be reformulated as convex problems with global guarantees under suitable conditions. These findings inspire learning paradigms that enhance stability while allowing for increased depth. MGDL leverages this concept by training deep networks incrementally: once a grade is learned, it is frozen, and a new grade-wise subnetwork is added on top to approximate the residual error of the current approximation. This creates a hierarchical refinement process that is both interpretable and structured. We establish an operator-theoretic basis for MGDL and demonstrate that for any continuous target function on a hypercube, there exists a fixed-width multigrade ReLU scheme. In this scheme, residuals decrease pointwise in magnitude and converge uniformly to zero, with strict $L^p$-norm decay at each nontrivial grade for $p\in [1,\infty)$. To our knowledge, this study offers the first rigorous constructive approximation guarantee proving that a grade-wise residual refinement approach can achieve vanishing error within a fixed-width multigrade ReLU architecture.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC