arXiv

Libra: Efficient Resource Management for Agentic RL Post-Training

Title: Libra: Optimizing Resource Allocation for Post-Training Agentic Reinforcement Learning

Abstract:

Reinforcement learning (RL) has emerged as a standard post-training framework for large language models (LLMs), expanding their capabilities beyond simple preference alignment to encompass complex reasoning and multi-turn agentic interactions. However, the rollout phase in agentic RL introduces significant resource management hurdles. By invoking tools to generate trajectories, this stage creates long-tailed and non-stationary workloads that defy traditional resource-management assumptions.

Three primary challenges define this landscape. First, the long-tailed nature of the distribution means that a minimal number of trajectories are responsible for the majority of the rollout makespan. Second, there is a pronounced asymmetry between the rollout and training phases regarding their sensitivity to sequence length, memory requirements, and compute patterns. Third, as the RL policy evolves, the distribution of trajectory lengths shifts over time, causing any fixed resource split to become increasingly inefficient.

To address these issues, we introduce Libra, a system built on two core mechanisms. The first is a periodic global resource planner that simultaneously optimizes GPU allocation across both rollout and training clusters. This planner utilizes an elastic hybrid pool to facilitate rapid, non-blocking reallocation of workers between stages. The second mechanism is a causality-driven multi-level feedback queue (C-MLFQ) scheduler. Instead of relying on unreliable length predictions, this scheduler directs requests to heterogeneous rollout buckets based on causal signals extracted from tool-return outcomes.

Evaluations conducted on 48 A800 GPUs demonstrate that Libra outperforms baseline methods, achieving up to a 3.0$\times$ increase in throughput and converging up to 2.5$\times$ faster in terms of reward.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...