arXiv

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

Title: MusaCoder: Achieving Native GPU Kernel Generation via Full-Stack Training on Moore Threads Architecture

Abstract:

Transforming high-level tensor programs into efficient, executable low-level code is the core challenge of native GPU kernel generation. While existing Large Language Models (LLMs) face difficulties in this domain, execution-based reinforcement learning (RL) approaches are often hindered by issues such as sparse rewards, reward hacking, and training instability. To address these challenges, we introduce MusaCoder, a comprehensive full-stack training framework designed for native GPU kernel generation across both CUDA and MUSA backends.

MusaCoder integrates three key components: progressive kernel-oriented data synthesis, diversity-preserving rejection fine-tuning, and execution-feedback reinforcement learning facilitated by MooreEval—a distributed verifier and reward environment. To ensure RL stability, the framework employs three specialized mechanisms: PrimeEcho, which anchors multi-turn rewards to the first turn; Buffered Dynamic Retry, which recovers signals from hard samples that have completely failed; and MirrorPop, which filters off-policy sequences.

Experimental evaluations on KernelBench and a MUSA-ported variant demonstrate that MusaCoder surpasses both robust open-source and proprietary baselines in terms of empirical speedup and correctness. Specifically, the 9B model performs on par with or better than leading closed-source models, while the 27B model sets a new state of the art. These findings highlight the efficacy of full-stack execution-feedback training for native kernel generation and validate the capability of Moore Threads GPUs to support the entire LLM post-training stack, offering a practical foundation for optimizing large models on emerging accelerators.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive
Bloomberg

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive

Microsoft AI CEO Mustafa Suleyman criticized Anthropic’s models as too expensive. Meanwhile, Microsoft plans to allow us...

Ramp Notches $44 Billion Valuation in New Funding Round
Bloomberg

Ramp Notches $44 Billion Valuation in New Funding Round

RAMP secured a $44 billion valuation in its latest funding round. CEO Eric Glyman attended the 2026 Reagan National Econ...

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...