arXiv

DynMuon: A Dynamic Spectral Shaping View of Muon

Title: DynMuon: A Dynamic Spectral Shaping View of Muon

Original: arXiv:2605.17109v3 Announce Type: replace-cross Abstract: In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M=U\Sigma V^\top$ with its polar factor $UV^\top$. In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $U\Sigma^p V^\top$ for some parameter $p$. We call this a "spectral-shaping" operation, and develop a theory of how to pick $p$ which depends on (a) local curvature of the loss function, (b) noise stemming from stochastic gradients and label noise, and (c) training stage. Our theory and experimentation reveal a previously overlooked behavior: positive $p$ helps early by emphasizing high-curvature directions and accelerating signal contraction, while mildly negative $p$ helps later by reallocating update strength toward low-curvature directions that still contain useful training signals. Building on the insight, we propose DynMuon, an efficient dynamic spectral shaping method that schedules $p$ from positive to mildly negative over training. Extensive experiments across model sizes, architectures, and training settings show that DynMuon consistently achieves lower validation loss than Muon, while requiring 10.6-26.5% fewer steps to reach the same target loss. Our code is available at https://github.com/fzwark/DynMuon.

Rewrite: Title: DynMuon: A Dynamic Spectral Shaping View of Muon

Original: arXiv:2605.17109v3 Announce Type: replace-cross Abstract: In recent years, Muon has become the leading approach for training large language models and transformers. Unlike conventional gradient descent, Muon substitutes the standard update matrix $M=U\Sigma V^\top$ with its polar factor $UV^\top$. This study explores a family of Muon-inspired updates, modifying the update rule to $U\Sigma^p V^\top$ by introducing a parameter $p$. We term this modification "spectral shaping" and establish a framework for selecting $p$ based on (a) the loss function's local curvature, (b) noise from stochastic gradients and labels, and (c) the current training phase. Our theoretical analysis and empirical results uncover a significant trend: a positive $p$ benefits the early stages by highlighting high-curvature directions and speeding up signal contraction, whereas a slightly negative $p$ aids later stages by shifting update power toward low-curvature directions that retain valuable training information. Leveraging this finding, we introduce DynMuon, an efficient dynamic spectral shaping technique that transitions $p$ from positive to mildly negative throughout training. Comprehensive tests across various model sizes, architectures, and training configurations demonstrate that DynMuon consistently yields lower validation loss than Muon, needing 10.6-26.5% fewer steps to achieve the same target loss. Our code is available at https://github.com/fzwark/DynMuon.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...