arXiv

LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

Title: LoopMoE: Harmonizing Iterative Processing and Mixture-of-Experts for Language Modeling

Abstract: Mixture-of-Experts (MoE) and looped architectures offer distinct pathways for scaling models, enhancing parameter capacity and effective depth respectively. However, conventional looped designs typically utilize dense backbones, creating a coupling between parameter volume and per-token FLOPs. This interdependence hinders the ability to isolate the specific impact of iterative computation when operating under equivalent budget constraints. To address this limitation, we introduce LoopMoE, a looped MoE language model that combines sparse routing with iterative, weight-shared computation through two key innovations. First, IterAdaLN breaks the symmetry inherent in weight-sharing by employing a modulation signal derived from both the per-token hidden state and the iteration index. Second, we implement a capacity-balancing mechanism that restores the attention-to-FFN active parameter ratio found in well-optimized, non-looped counterparts. These innovations facilitate the first rigorous, head-to-head comparison between a looped MoE and a standard Vanilla MoE, maintaining identical totals for parameters, per-token FLOPs, and active sublayer ratios. In evaluations at the 3B scale, LoopMoE surpasses the Vanilla MoE on eight out of nine downstream benchmarks, achieving an average improvement of more than one point. Furthermore, at the 9B scale, LoopMoE maintains its superiority over the matched Vanilla MoE, demonstrating that these architectural benefits remain effective at larger sizes. This study provides a controlled integration of sparsity and recurrence, highlighting a viable trajectory for the development of looped language models.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Benchmark raises its first-ever growth fund as part of $2B capital raise

Benchmark Capital launches its first growth fund, raising $2 billion to target later-stage AI deals. This marks a strate...

Netflix Aims to Use AI to Help Viewers Manage Content Overload
Bloomberg

Netflix Aims to Use AI to Help Viewers Manage Content Overload

Netflix uses AI to help viewers manage content overload, tackling the challenge of too many choices.

TSMC CEO Warns Chip Supply Won’t Meet AI-Fueled Demand for Years
Bloomberg

TSMC CEO Warns Chip Supply Won’t Meet AI-Fueled Demand for Years

TSMC CEO warns that chip supply will lag behind surging AI demand for years. This multi-year shortfall highlights the in...

Reuters

TSMC boss upbeat on outlook as AI boom shows no sign of easing

TSMC executives remain optimistic as sustained AI demand shows no signs of slowing, driving strong confidence in the com...

Bitcoin Falls to Pre-Iran Conflict Low as Crypto Slide Extends
Bloomberg

Bitcoin Falls to Pre-Iran Conflict Low as Crypto Slide Extends

Bitcoin drops to its lowest level before the Iran conflict, extending a broader cryptocurrency decline.

Why Amazon Has Struggled to Crack India
Bloomberg

Why Amazon Has Struggled to Crack India

Amazon’s aggressive push for dominance in India has stalled, marking the end of its ambitious expansion efforts. The 202...