arXiv

Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation

Title: Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation

Abstract:

While generative video models have made significant strides in achieving high visual fidelity and temporal consistency, precise camera control remains a persistent challenge. Current frameworks typically treat camera movement as a secondary outcome of pixel generation, resulting in trajectories that are often random, spatially disjointed, and disconnected from the human subjects central to the scene. To address this, we introduce Auteur, a novel approach that enables language-driven, human-centric camera framing within generative video systems.

Our primary observation is that professional cinematographers do not conceptualize shots as trajectories through world space; rather, they define framing relative to the actor, specifying shot size, angle, and composition as variables dependent on human pose and motion. We translate this intuition into a human-centric camera parameterization and develop a Domain-Specific Language (DSL) that can be converted into standard 6-DoF camera parameters. In this pipeline, a fine-tuned multimodal large language model serves as a virtual director, translating natural language prompts and coarse human motion data into sparse DSL keyframes. These keyframes are deterministically interpolated to create continuous camera trajectories, which are subsequently fed into video generators as input.

We trained and evaluated the Auteur framework using a newly constructed dataset comprising 34,000 instances of aligned text, human motion, and DSL-annotated camera trajectories. This dataset was compiled from procedural synthesis and real-world movie footage sourced from the CondensedMovies dataset. Auteur successfully introduces cinematographic framing capabilities for human-centered scenes, a feature largely missing from previous generative models. To rigorously evaluate this performance, we developed new metrics focused specifically on framing quality. Our experimental results demonstrate that Auteur consistently surpasses existing methods.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...