Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo
Title: Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo
Abstract:
Creating a photorealistic 3D face avatar from just one casual photograph remains a significant hurdle. Current feed-forward 3D Gaussian Splatting (3DGS) methods often struggle when faced with out-of-distribution inputs, while pretrained diffusion models, despite generating high-quality images, frequently fail to maintain consistency across multiple views. We argue that these two approaches are inherently complementary: explicit 3D structures ensure geometric coherence, while 2D diffusion priors drive photorealistic detail. Leveraging this synergy, we introduce SplatShot, a novel, training-free framework that integrates these representations directly into the denoising phase. Starting with a foundational 3DGS face model and a single reference image, SplatShot performs joint denoising across all target viewpoints through a per-step 3D feedback mechanism. During each timestep, the system generates clean images from noisy latents, updates the 3DGS model based on these multi-view predictions, and then back-propagates the photometric errors between the re-rendered 3DGS outputs and the 2D predictions into the noise estimate. This process guides the sampling trajectory toward results that are strictly consistent in 3D while faithfully preserving identity. Our experiments, conducted on a wide variety of in-the-wild images, show that SplatShot generates 3D avatars that excel in identity retention, photorealism, and multi-view consistency.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





