arXiv

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

UltraEP: Achieving Near-Optimal Load Balancing for Rack-Scale MoE Training and Inference

arXiv:2606.04101v1 | Announcement Type: Cross

Abstract

As large-scale expert parallelism (EP) becomes essential for training and deploying cutting-edge Mixture-of-Experts (MoE) models, it introduces significant challenges. Specifically, it exacerbates device-level expert load imbalances, leading to compute stragglers, token all-to-all bottlenecks, and spikes in activation memory usage. Current balancing strategies typically redistribute experts at periodic intervals based on historical load data; however, this approach proves unreliable in production environments characterized by non-stationary load patterns.

To address these limitations, we introduce UltraEP, the first real-time balancer designed for exact-load management in large-EP MoE training and prefilling on rack-scale nodes (RSNs). Leveraging the enhanced scale-up connectivity inherent to RSNs, UltraEP performs rebalancing for every microbatch and layer along critical execution paths. This process demands a sophisticated co-design of plan solving and expert replication communication to minimize overhead.

UltraEP responds immediately to post-gating load variations through efficient, quota-driven planning. It subsequently executes irregular expert-state transfers using RSN-native persistent tile streaming and employs relay-based fan-out mitigation techniques. Evaluated across MoE models ranging from 106B to 671B parameters during both training and prefilling phases, UltraEP attains 94.3% of the ideal throughput achieved by force-balanced systems. This represents a 1.49$\times$ performance gain over systems without balancing, while significantly lowering the final inter-rank imbalance from a range of 1.30–4.01 down to 1.01–1.04. Furthermore, we demonstrate UltraEP’s scalability and robustness through production MoE training experiments utilizing 2,560 GPUs.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...

Who is Elon Musk and what is his net worth?
BBC News

Who is Elon Musk and what is his net worth?

Elon Musk, CEO of Tesla and SpaceX, became the first person to surpass a $500 billion net worth in October 2025. His wea...

AI Boom Propels China Optical Maker to Top Weighting on CSI 300
Bloomberg

AI Boom Propels China Optical Maker to Top Weighting on CSI 300

Driven by surging AI demand, a Chinese optical maker has reached the highest weighting in the CSI 300 index.

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)
Bloomberg

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)

BNP Paribas’ Huynh describes the AI bubble as “something to look at,” signaling cautious interest in the sector’s potent...

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million
Bloomberg

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million

PayPay is acquiring T&D Holdings’ life insurer for $840 million, shortly after its historic $879.8 million Nasdaq IPO.

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots
Bloomberg

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots

Goldman Sachs CEO David Solomon discusses integrating AI into banking operations. He explores how artificial intelligenc...