FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data
Title: FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data
Abstract: Reinforcement learning has recently emerged as the dominant approach for enhancing the self-correction capabilities and long-chain reasoning skills of language models. Although Group Relative Policy Optimization (GRPO) provides significant scalability advantages by removing the need for a critic network, its implementation on centralized systems requires aggregating vast amounts of data from various distributed sources, thereby creating substantial privacy vulnerabilities. To resolve these issues, we present Federated GRPO (FGRPO), a decentralized framework that facilitates the fine-tuning of reasoning models across diverse data holders. FGRPO addresses the instability resulting from varying reward scales across different tasks by employing an adaptive aggregation mechanism driven by relative performance gains. By measuring each client’s progress against its own personalized historical baseline, the system dynamically emphasizes productive learning trajectories, independent of local task complexity. This approach guarantees stable convergence on non-independent and identically distributed (non-IID) data while maintaining strict data privacy.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



