arXiv

Chameleon: Style-Content Disentangled Framework for Cross-Domain Object Compositing

Title: Chameleon: A Framework for Disentangling Style and Content in Cross-Domain Object Compositing

Abstract:

Image compositing involves the seamless integration of a foreground subject into a background scene. While recent progress in diffusion models has markedly improved quality—particularly when both elements originate from the same domain, such as natural imagery—cross-domain compositing remains a significant challenge. This task requires the model to maintain the identity of the foreground object while simultaneously adapting its style to align with the background domain. Currently, this area is under-researched, and most existing solutions depend on training-free blending and refinement techniques. This reliance stems largely from the scarcity of large-scale paired datasets for cross-domain scenarios, which has hindered the creation of training-based alternatives. Consequently, prior methods are often restricted to tone-level adjustments, leading to results that are either stylistically inconsistent or excessively stylized.

To address these issues, we introduce ChameleonDataset, the first large-scale training dataset designed for cross-domain compositing, accompanied by a comprehensive evaluation benchmark. This resource was developed using a scalable data construction pipeline. Leveraging this dataset, we present Chameleon, a novel two-stage training-based framework for cross-domain compositing. The first stage employs Joint Hard Contrastive Learning (JHCL) to train the ChameleonEncoder, successfully separating style and content representations. In the second stage, we integrate Spatio-Temporal Attention Gating (STAG) into a diffusion transformer to facilitate effective stylization. This mechanism adaptively controls the injection of style tokens from the initial encoder across both spatial and temporal dimensions. Our approach surpasses current state-of-the-art models for both in-domain and cross-domain compositing, as well as sequential pipelines and commercial tools, demonstrating superior performance in compositional plausibility and stylistic fidelity.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...