arXiv

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

Title: CodegenBench: Can Large Language Models Generate Efficient Code for Diverse Architectures?

Abstract:

Although large language models (LLMs) have undergone rigorous testing for code generation in general programming and GPU-centric frameworks like PyTorch and CUDA, their proficiency in producing high-performance computing (HPC) code for CPUs across varied hardware remains largely unexamined. To address this oversight, we present CodegenBench, a robust benchmarking framework aimed at assessing the ability of LLMs to generate efficient parallel code for three specific hardware platforms: Sunway, Kunpeng, and x86_64.

The benchmark consists of 106 standard Basic Linear Algebra Subprograms (BLAS) routines, which serve as a foundational baseline, complemented by 20 specialized computational kernels tailored to the unique demands of each supercomputing architecture (specifically LeetSunway and LeetKunpeng). Our comprehensive analysis demonstrates that while leading LLMs can successfully produce optimized code for widely supported architectures such as x86_64, they suffer notable performance drops when handling domain-specific architectures that lack extensive public documentation and training data. This disparity underscores significant challenges in cross-platform generalization.

Additionally, our investigation into variables affecting code quality—such as task complexity and implementation length—suggests that current LLMs perform best on moderately challenging tasks that demand short, concise code snippets. To support ongoing research in LLM-driven high-performance code generation, we have released our dataset and automated evaluation infrastructure. These resources can be accessed at https://anonymous.4open.science/r/CodegenBench-EDE1/ and https://anonymous.4open.science/r/CodegenBenchDataset-2551.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.