Rethinking Bregman Divergences in Kronecker-Factored Optimizers
Title: Reevaluating Bregman Divergences within Kronecker-Factored Optimization Frameworks
Abstract
Optimizers utilizing the Shampoo architecture rely on Kronecker-factored structures to approximate gradient covariance matrices. Recent research by Lin et al. (2026) demonstrated that these approximations can be interpreted as projections governed by Bregman matrix divergences, which consequently yield distinct Kronecker-factored preconditioners. Nevertheless, the specific influence of the selected divergence metric remains ambiguous when the underlying covariance matrix does not strictly adhere to a Kronecker factorization. To address this gap, we investigate the problem through an analysis of the covariance matrix’s spectral properties. Our findings reveal that the Frobenius, von Neumann, and LogDet divergences allocate the inevitable error inherent in Kronecker approximation differently across the covariance spectrum. Furthermore, we demonstrate that the resulting Kronecker factors are determined by divergence-weighted residuals rather than the raw approximation error, thereby elucidating how these spectral biases manifest in the final preconditioners. Empirical observations indicate that the primary eigenspace of the covariance matrix aligns significantly more closely with the Hessian, whereas the tail spectrum exhibits considerable noise and unreliability. Drawing on these insights, we introduce a subspace-aware Kronecker optimizer. This approach employs eigenvalue-based preconditioning for the dominant subspace while utilizing an adaptive isotropic acceleration constant for the residual subspace.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





