Fast Generalization after Interpolation via Critically Damped Momentum Optimization
Title: Achieving Rapid Generalization Post-Interpolation Through Critically Damped Momentum Optimization
Abstract: A persistent challenge in machine learning is the discrepancy between models that attain near-perfect training accuracy and those that fail to generalize effectively to unseen data. This divide is particularly pronounced in high-dimensional, low-sample scenarios, where numerous interpolating solutions are available, requiring the optimization process to implicitly choose among minima with varying generalization capabilities. Building on recent theoretical insights into optimization dynamics around the interpolation threshold, we observe that the two-regime structure of risk minimization—comprising an initial phase of loss reduction followed by complexity reduction—suggests a biphasic optimization schedule. Consequently, we provide theoretical proof that GROKtimizer, a biphasic approach merging fast convergence to interpolation with Critically Damped Momentum (CDM)-based norm minimization after interpolation, serves as a natural mechanism for selecting low-norm interpolating solutions. Within a local quadratic model of the post-interpolation basin, GROKtimizer delivers a quadratic speedup compared to standard gradient descent, demonstrating provable optimality among first-order optimizers. To illustrate the method's practical utility, we assess GROKtimizer across several synthetic benchmarks prevalent in classical grokking research as well as various real-world datasets. Finally, we align our results with the flat-minima hypothesis, underscoring the critical role of post-interpolation dynamics in developing high-quality models with strong generalization capabilities.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





