Edge of Stability Selectively Shapes Learning Across the Data Distribution
Title: Edge of Stability Selectively Shapes Learning Across the Data Distribution
Abstract:
Current research typically characterizes the edge of stability (EoS) as a universal feature of the optimization process. In contrast, we demonstrate that this phenomenon is selective: the stability constraint actively redistributes learning efforts across different segments of the training data, thereby enhancing advancement for certain groups while hindering it for others. By employing a branching intervention that allows the model to either enter or exit the EoS regime from an identical initial state, we provide causal evidence of this trade-off and pinpoint two critical prerequisites for a specific group to gain an advantage. The first condition requires that the group’s aggregate gradient aligns with the principal Hessian eigenvector. We isolate this mechanism through a controlled perturbation that maintains distance but randomizes direction; this disruption of alignment effectively nullifies the benefit. The second condition is that the group must maintain a non-vanishing gradient magnitude throughout training. Under cross-entropy loss, gradient saturation causes confidently classified groups to decouple, thereby transferring the advantage to output-outliers, which sustain persistent gradients. Collectively, these findings reveal that the EoS serves not merely as a boundary for stability, but as a governing mechanism for how learning is allocated across the entire data distribution.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC



