Global News Digest

arXiv

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

Title: ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

Abstract: While Mixture-of-Experts (MoE) architectures achieve scalability by activating only a limited number of experts for each token, their training process is hindered by the discrete and non-differentiable nature of top-$k$ routing. This limitation necessitates the use of gradient estimators for expert selection, a task that currently represents a significant open challenge in the field. To address this, we present ProbMoE, a novel framework that treats expert selection as a distribution over expert subsets with fixed cardinality, thereby casting the routing problem as probabilistic inference within this discrete space.

Our approach introduces ProbMoE Exact-$k$ routing, which samples subsets of $k$ experts during the forward pass. For the backward pass, we employ the exact marginal probability of each expert as a computationally efficient surrogate to approximate the true gradient. Furthermore, ProbMoE seamlessly extends to a dynamic-$k$ setting. In this configuration, both the training and inference phases restrict the routing cardinality to a specific predefined range, enabling the model to adaptively allocate experts on a per-token basis.

Empirical evaluations across various model backbones and benchmarks demonstrate that ProbMoE Exact-$k$ delivers robust performance relative to strong baselines, while also enhancing routing diversity and expert utilization. Meanwhile, ProbMoE Dynamic-$k$ maintains comparable performance levels while requiring the activation of fewer experts.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.