Global News Digest

arXiv

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Title: DAG-MoE: Advancing from Basic Mixture to Structural Aggregation in Mixture-of-Experts

Abstract

While Mixture-of-Experts (MoE) architectures have emerged as a premier strategy for separating parameter volume from computational expenditure in large language models, scaling their effectiveness continues to present significant hurdles. Previous studies indicate that utilizing fine-grained experts broadens the range of possible expert combinations, thereby enhancing flexibility; however, this approach also introduces considerable routing overhead, which establishes a new limit on scalability. This study investigates an alternative dimension for scaling: the method by which expert outputs are combined. Through theoretical analysis, we demonstrate that substituting conventional weighted summation with structural aggregation increases the diversity of expert combinations without modifying the underlying experts or the routing mechanism, while also facilitating multi-step reasoning within a single MoE layer. To implement this, we introduce DAG-MoE, a sparse MoE architecture featuring a lightweight component designed to automatically identify the most efficient aggregation structure among chosen experts. Comprehensive evaluations under standard language modeling conditions reveal that DAG-MoE delivers consistent performance gains across both pretraining and fine-tuning phases, outperforming established MoE baseline models.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.