Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients
Title: Robust Non-smooth Optimization via Safeguarded Stochastic Polyak Step Sizes: Eliminating Dependence on Small (Sub)Gradients
Abstract:
While the Stochastic Polyak Step Size (SPS) has emerged as a highly effective strategy for Stochastic Gradient Descent (SGD)—matching the performance of leading-edge techniques in both smooth convex and non-convex domains, including deep neural network training—its adaptation to non-smooth environments is still nascent. Current extensions in this area typically depend on interpolation assumptions or require prior knowledge of the optimal solution. To address these limitations, we introduce Safeguarded SPS (SPS${safe}$), a new variant tailored for the stochastic subgradient method. This approach offers rigorous convergence guarantees for non-smooth convex optimization without relying on stringent assumptions. Additionally, by integrating momentum into the update mechanism, we achieve theoretical bounds that are equally tight. Our empirical evaluations across convex benchmarks and deep neural networks validate these theoretical findings, demonstrating that our proposed step size delivers performance comparable to established adaptive baselines while maintaining stability across diverse problem configurations. Notably, in deep learning applications, SPS${safe}$ prevents gradient norms from collapsing to (near) zero, thereby ensuring robustness against the vanishing gradient problem.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





