arXiv

Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation

Title: Generating in Reconstruction Space, Matching in Semantic Space: Transport Geometry for One-Step Generation

Abstract:

Generative modeling and self-supervised representation learning (SSL) pursue fundamentally distinct optimization goals: the former prioritizes distributional fidelity, whereas the latter emphasizes semantic coherence. Although recent studies have consistently demonstrated that SSL features enhance generative training, the underlying mechanism driving this synergy remains elusive. This study investigates the utility of SSL within the context of one-step generation, a framework that explicitly leverages representations by employing frozen SSL features to align generated samples with real data. We utilize Sinkhorn divergence within this feature space as a computationally efficient surrogate for the Wasserstein distance, which serves as the population-level discrepancy metric approximated by Fréchet-style evaluations (e.g., FID). Our findings indicate that this objective achieves superior performance when applied to semantically structured SSL feature spaces, yielding a 39-fold reduction in ImageNet FID. We attribute this improvement primarily to the dynamics of matching estimation: by filtering out nuisance reconstruction details, semantic SSL features create a more compact geometry, thereby rendering distribution matching more tractable. Consequently, the optimal SSL features for training do not necessarily align with those used in evaluation metrics. For instance, we demonstrate that employing Inception as a feature extractor can artificially lower FID scores while simultaneously compromising matching stability and sample quality, a phenomenon indicative of metric hacking. Through extensive experiments on ImageNet, we delineate which SSL feature families yield the highest generation performance and propose matching stability as a quantitative metric for their selection. Code is available at https://github.com/Genentech/semantic-transport-generation.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...