arXiv

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

Title: DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

Abstract:

Adapting large language models (LLMs) in settings characterized by limited resources and high privacy requirements presents significant hurdles. Because training datasets are frequently fragmented across various clients, decentralized fine-tuning emerges as a viable strategy for collaborative model adaptation without necessitating a central authority. Nevertheless, implementing full-parameter fine-tuning (FPFT) within a decentralized architecture is notoriously difficult. While FPFT delivers robust adaptation capabilities, it demands excessive computational resources for models comprising billions of parameters. Consequently, current decentralized approaches for LLM fine-tuning predominantly utilize parameter-efficient updates. Although these methods enhance efficiency, they can potentially compromise performance on downstream tasks. Furthermore, the prevalence of non-IID (non-independent and identically distributed) data among clients exacerbates the risks of client drift and convergence instability in decentralized optimization processes.

To overcome these obstacles, we introduce DECA, a framework designed for resource-efficient, decentralized full-parameter fine-tuning of LLMs operating on non-IID data. DECA functions by dividing model parameters into separate, non-overlapping blocks and executing sequential block-wise Adam optimization. This approach significantly lowers resource demands while maintaining the benefits of decentralized full-parameter adaptation. To ensure training stability, DECA incorporates first- and second-order block-wise moment estimates that leverage fresh local gradient statistics alongside discrepancy signals derived from consensus mechanisms. Through comprehensive theoretical analysis and extensive empirical evaluations, we demonstrate that DECA delivers rapid convergence, superior downstream performance, and marked improvements in resource efficiency.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...