Mirror Descent Under Generalized Smoothness
Title: Mirror Descent Under Generalized Smoothness
Abstract: Achieving rapid convergence rates in first-order optimization typically relies on the assumption of smoothness. Nevertheless, contemporary machine learning challenges frequently present non-smooth objective functions. To bridge this gap, recent research has softened the smoothness requirement by permitting the Lipschitz constant of the gradient to increase in proportion to the gradient’s norm, thereby encompassing a wider variety of practical scenarios. Despite these advancements, current extensions of smoothness remain confined to Euclidean geometry utilizing the $\ell_2$-norm, offering theoretical assurances solely for optimization within Euclidean spaces. This work overcomes such constraints by defining a novel $\ell*$-smoothness framework, which quantifies Hessian norms relative to an arbitrary norm and its dual. We demonstrate that mirror-descent algorithms operating under this new definition achieve convergence rates equivalent to those observed under traditional smoothness assumptions. A key element of our approach is the introduction of a generalized self-bounding property, which aids in gradient estimation by regulating suboptimality gaps and forms the cornerstone of our convergence proofs. Furthermore, we derive tight convergence bounds for stochastic mirror descent, aligning with the best-known results for classically smooth problems. Our theoretical findings are also applicable to composite and non-convex optimization, potentially illuminating the practical deployment of mirror descent in tasks such as the pre-training and post-training of Large Language Models (LLMs).
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





