arXiv

How Much Orthogonalization Does Muon Need?

Title: Reevaluating the Orthogonalization Requirements of Muon

Abstract:

Muon optimizers enhance neural network training by substituting ill-conditioned momentum updates with updates that are approximately semi-orthogonal. This capability raises a practical inquiry: to what extent does Muon actually depend on orthogonalization? To investigate this, we employ a relaxed cubic Newton–Schulz schedule specifically tailored to Muon’s low-precision singular value band. This five-step cubic approach necessitates only ten dominant matrix multiplications, a notable reduction from the fifteen required for five quintic Newton–Schulz iterations. Importantly, this cubic schedule is not designed to serve as a superior polar solver; rather, it functions as a principled, low-cost alternative that allows for an examination of the connections between polar accuracy, spectral shaping, and overall training performance.

Through synthetic diagnostics, NanoGPT ablations, and training trials on hybrid MoE/Mamba architectures, we demonstrate that training quality does not correlate monotonically with the precision of polar decomposition. Specifically, the truncated Polar Express, Muon-Jordan, the cubic Newton–Schulz method, and an explicit FP32 SVD polar factor all achieve nearly identical final loss metrics on GPT-2 Small. Furthermore, cubic5 aligns with the Muon-Jordan quintic update within a margin of approximately $10^{-3}$ in validation loss when applied to hybrid MoE/Mamba models ranging from one to four billion parameters. These findings validate cubic5 as a viable, low-cost orthogonalization variant for Muon, offering empirical proof of training-quality parity in the tested environments.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...