When Do Fewer Coordinates Suffice in DP-SGD?
Title: Optimizing DP-SGD: The Case for Reduced Coordinate Updates
Abstract: Standard differentially private stochastic gradient descent (DP-SGD) introduces noise to every coordinate during updates, causing the total noise magnitude to grow in proportion to the ambient parameter dimension $d$. This study investigates the conditions under which private training can restrict updates to a subset of coordinates while preserving the signal essential for optimization. We introduce \textsc{TP-TopK} (Two-Phase TopK DP-SGD), a framework for coordinate-sparse private training that operates without public data. This approach utilizes a private warm-up phase to identify a coordinate support, which then directs the primary training phase. Our analysis provides a criterion to determine when restricting coordinates yields benefits, demonstrating through a nonconvex stationarity bound that the associated noise term depends on the active dimension $k$ rather than the full dimension $d$, provided the criterion is met. Additionally, we establish a lower bound on the reliability of coordinate rankings derived from warm-up scores. Empirical evaluations on MNIST, FMNIST, and CIFAR-10 indicate that learned coordinate supports capture more gradient energy than random supports of equivalent size, with the most significant improvements observed when the active dimension is low and warm-up scores are highly informative.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




