Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control
Title: Steering Video Generation Models Through Reduced-Order Linear Optimal Control
Abstract:
Text-to-video (T2V) systems, which are trained on extensive web datasets, often produce unintended or harmful content. This challenge has spurred the development of intervention techniques designed to mitigate undesirable outputs while maintaining high visual standards. While activation steering presents a compelling mechanistic alternative to traditional methods like prompt filtering and fine-tuning, current approaches for T2V models are constrained. They typically rely on broad, non-anticipatory interventions that risk causing oversteering and a subsequent decline in content quality.
To address these limitations, we introduce Latent Activation Linear-Quadratic Regulator (LA-LQR), a framework based on reduced-order optimal control that enables minimally invasive steering of T2V models. By treating T2V inference as a dynamical system, LA-LQR calculates closed-loop feedback interventions. These interventions guide activations toward specific target feature setpoints while imposing penalties on superfluous perturbations.
Making optimal control computationally viable for the high dimensionality of video activations requires projecting them onto a low-dimensional subspace relevant to the task. This subspace is derived from contrastive prompt pairs. Within this latent space, we estimate local linear dynamics and solve a latent LQR problem to generate steering signals that are specific to both timesteps and layers. We establish theoretical bounds that connect the tracking of latent setpoints to feature control in raw activation space and empirically demonstrate the accuracy of the reduced latent dynamics. Evaluations on video safety and concept steering benchmarks show that LA-LQR significantly lowers the incidence of unsafe generations compared to baseline methods, all while retaining prompt fidelity and visual integrity.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





