FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo
Title: FOAM: Adaptive Damping via Frequency and Operator Error to Mitigate Staleness-Induced Errors in Shampoo
Abstract:
The Shampoo optimizer has garnered significant interest due to its exceptional performance on large-scale optimization benchmarks. However, its practical deployment is hindered by the substantial computational cost associated with matrix inversion. To address this challenge, common practice involves updating the preconditioner with stale data, a strategy that forces a compromise between computational speed and optimization accuracy. This paper presents a theoretical examination of the impact of staleness, analyzing it from the dual perspectives of convergence and stability. Although employing stale updates enhances efficiency, it inevitably compromises performance and induces numerical instability. Our analysis reveals that damping serves as a crucial numerical stabilizer, capable of effectively counteracting these detrimental effects. Leveraging these insights, we introduce FOAM, an adaptive algorithm designed to stabilize the training process. FOAM dynamically adjusts both the damping factor and the frequency of eigendecomposition by estimating the staleness-oriented error. Our experimental findings indicate that FOAM achieves shorter wall-clock times than standard Shampoo without sacrificing robust convergence.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




