Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks
Title: Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks
Abstract
A prevailing heuristic for explaining how first-order gradient methods generalize in non-convex neural networks is the principle that "flat interpolators generalize effectively" (Hochreiter and Schmidhuber, 1994; Keskar et al., 2017). In this context, flatness is typically quantified by the trace of the Hessian of the empirical loss. However, Dinh et al. (2017) demonstrated that by exploiting network symmetries, one can alter the flatness of a model without affecting either the empirical or population losses. Consequently, any interpolator can be rendered either sharper or flatter, rendering the earlier heuristic statement vacuous.
In this study, we investigate the learning of an unknown multi-index model using 2-layer non-convex homogeneous neural networks. We demonstrate that a connection between flatness and generalization persists despite the presence of these symmetries. This relationship specifically concerns the "flattest" interpolators—those achieving the orderwise minimum flatness among all possible interpolators.
First, we identify a natural class of interpolators that fail to generalize, showing that their flatness cannot be improved to approach the theoretical minimum, even when symmetries are utilized. Second, we prove that for data generated by a sum of single-index models, any flattest interpolator achieves a small population loss, provided that both approximation error and label noise are low. Thus, the flattest interpolators consistently generalize. This finding establishes a direct link between flatness and generalization that holds for a broad spectrum of activation functions and realistic data distributions.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






