arXiv

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

June 2, 2026 · Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong, Ameesh Makadia, Meiqi Guo, Laurent Itti, Jindong Chen · Original Source

Title: 3DCodeBench: A Benchmark for Agentic Procedural 3D Modeling Through Code

Abstract

Procedural 3D modeling via code is gaining traction as a flexible paradigm, delivering assets that are deterministic, ready for engine integration, and easily editable—capabilities that neural 3D generators typically fail to provide. However, creating such procedural content requires specialized knowledge in parametric design, code-level geometric reasoning, and 3D software APIs. To address this, we introduce 3DCodeBench, a comprehensive benchmark designed to assess vision-language model (VLM) agents in the context of procedural 3D generation within modeling software.

3DCodeBench specifically measures the capability of 12 leading VLMs to act as procedural 3D modelers by converting text and image references into procedural code compatible with 3D modeling applications. Acknowledging that automated metrics often fall short in capturing the perceptual quality of 3D shapes, we also developed 3DCodeArena, a ranking system grounded in pairwise human preferences for the generated 3D outputs.

Our extensive evaluations yield several key observations: (1) Most failures stem from API mismatches, yet even successful renders frequently exhibit disconnected or floating geometric components. (2) Test-time scaling strategies, such as increased thinking budgets and multi-turn refinement, lead to overall performance improvements. These findings underscore the urgent necessity for high-quality procedural coding datasets to drive the progress of commercial VLMs. Additionally, effective procedural 3D modeling depends on a robust execution environment capable of providing high-fidelity feedback for iterative refinement.

We make 3DCodeBench publicly available, comprising a curated large-scale dataset of multimodal (text/image) prompts, procedural code, 3D object triplets, an evaluation protocol, and the public 3DCodeArena platform. This collection serves as a foundational toolkit for future research into VLM-based procedural 3D modelers.

arXiv:2606.01057v1 Announce Type: cross

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC