PAI-Studio: Cinematic Video Background Replacement with Camera-Aware Motion
Title: PAI-Studio: Achieving Cinematic Video Background Swaps with Camera-Aware Motion
Abstract:
This paper introduces PAI-Studio, a novel reference-conditioned video synthesis framework designed to tackle a persistent hurdle in cinematic post-production: the creation of dynamic backgrounds that harmonize with foreground movement. The system ensures the preservation of the subject's identity, aligns with the visual style of a reference scene, and establishes globally consistent lighting, including realistic relighting of the foreground. Current open-source tools and commercial APIs typically fail to deliver all these elements simultaneously. They often produce static backdrops, exhibit inconsistent edges, and leave visible compositing artifacts, lacking the ability to maintain motion-consistent backgrounds alongside high-fidelity foreground relighting.
To address these limitations, we leverage a Diffusion Transformer video backbone and reframe the challenge as an in-context conditional generation problem. By employing bidirectional attention mechanisms, our architecture simultaneously processes foreground dynamics and reference background data. Additionally, we developed a comprehensive dataset comprising 30,000 samples extracted from premium films and online video sources to facilitate this specific task. Our extensive testing reveals that PAI-Studio significantly surpasses both existing open-source solutions and commercial API offerings in performance.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





