DVD: Discrete Voxel Diffusion for 3D Generation and Editing
Title: DVD: Discrete Voxel Diffusion for 3D Generation and Editing
Abstract:
We present Discrete Voxel Diffusion (DVD), a novel discrete diffusion framework designed to generate, evaluate, and manipulate sparse voxels within SLat (Structured LATent) 3D generative pipelines. While discrete diffusion methods have not yet overtaken continuous diffusion in tasks resembling image generation, our findings demonstrate their efficacy as a robust first-stage prior for sparse voxel scaffolds. By modeling voxel occupancy as an inherent discrete variable, DVD eliminates the need for continuous-to-discrete thresholding, offering a streamlined approach to voxel generation, uncertainty quantification, and editing. In addition to improving quality, the method enhances the interpretability of generation dynamics via explicit categorical modeling. We also utilize predictive entropy as a reliable uncertainty metric to pinpoint ambiguous voxel areas and complex samples, thereby supporting data filtering and quality assessment. Lastly, we introduce a lightweight fine-tuning technique employing block-structured perturbation patterns. This strategy enables the model to perform voxel inpainting and editing in a single sampling step, demanding minimal auxiliary computation and no extra model evaluations.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





