DPDL: Towards Differential Privacy Preservation in Decentralized Stochastic Learning on Non-IID Data
Title: DPDL: Achieving Differential Privacy in Decentralized Stochastic Learning with Non-IID Data
Abstract:
In decentralized learning frameworks, multiple agents work together to train a global model by leveraging distributed datasets, operating without the need for a central server. While numerous state-of-the-art studies have confirmed the benefits of such collaboration, this process requires extensive exchange of gradient information, which significantly increases the risk of privacy breaches for individual participants. Furthermore, practical applications often involve training data that is non-identically and independently distributed (non-IID) across agents, adding another layer of complexity to the goal of maintaining privacy in decentralized settings.
To tackle these challenges, we introduce DPDL, a decentralized learning algorithm designed to preserve privacy even when data is non-IID. DPDL integrates the principles of Differential Privacy (DP) into the cross-gradient aggregation process, utilizing a calibration technique based on similarity metrics. In every training round, each agent first applies a Gaussian noise mechanism to perturb cross-gradients—defined as the derivatives of their neighbors' local models calculated on their own private data—before transmitting them to neighboring agents. Subsequently, the agent employs cosine similarity to calibrate the perturbed cross-gradients it receives. This calibration ensures that the aggregated, adjusted cross-gradients can be used to update the local model in a manner analogous to momentum-based optimization.
Our comprehensive theoretical analysis demonstrates two key findings: first, it identifies the minimum noise level necessary to guarantee a desired degree of privacy protection; second, it proves that the algorithm maintains a linear speedup in training performance, even when dealing with non-IID data. Finally, we present extensive experiments conducted on real-world datasets, which validate that DPDL is effective not only in resisting privacy attacks but also in training highly accurate models.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




