arXiv

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing

June 2, 2026 · Azal Ahmad Khan, Ammar Ahmed, Zeshan Fayyaz, Sheng Di, Mingyi Hong, Ali Anwar · Original Source

Title: Enhancing Efficiency in Synchronous On-Policy RL Through Straggler-Adaptive Group Sizing

Abstract:

While synchronous reinforcement learning algorithms like Group Relative Policy Optimization (GRPO) offer reliable and reproducible on-policy training, they remain significantly susceptible to stragglers. In these systems, a single extended rollout can bottleneck the entire group, delaying both reward calculation and parameter updates. This synchronization delay intensifies as group sizes grow, establishing a conflict between the advantages of larger cohorts and the increasing wall-clock costs associated with synchronization stalls.

To address this, we introduce Straggler-Aware Group Control (SAGC), a mechanism that dynamically adjusts the training group size in real-time based on observed rollout performance. SAGC treats group-size selection as an online constrained optimization challenge, aiming to preserve the advantages of larger groups while managing the long-term frequency of straggler occurrences.

Our experiments demonstrate that SAGC consistently lowers the incidence of stragglers and boosts wall-clock efficiency across both GRPO and DAPO training frameworks, whether applied to vanilla or robust engineered baselines. These improvements are accompanied by competitive or superior training rewards. Furthermore, the benefits extend to final model performance: SAGC matches or exceeds the strongest static group-size baselines on downstream reasoning benchmarks, frequently generating shorter outputs without the need for explicit length penalties. These findings establish dynamic group control as a viable strategy for enhancing the efficiency and resilience of synchronous on-policy reinforcement learning.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC