Knowledge-Preserved Model Tuning in Null-Space for Robust Spatio-Temporal Video Grounding
Title: Null-Space Model Tuning for Preserving Knowledge in Robust Spatio-Temporal Video Grounding
Abstract: Spatio-Temporal Video Grounding involves localizing object tubes using textual queries. Although recent advancements have demonstrated significant success, most existing approaches prioritize high-quality (HQ) inputs, overlooking the prevalence of low-quality (LQ) videos in practical applications. While parameter-efficient tuning techniques such as LoRA offer adaptability to degraded inputs, they tend to compromise pre-trained knowledge. To resolve this conflict, we introduce Null-Space Tuning (NST). This framework capitalizes on the geometric principle that incorporating vectors from the null-space of frozen weights into the layer input leaves the output unchanged. By utilizing this property, NST embeds learnable residuals into input features that can remain undetected by the pre-trained backbone. Specifically, NST employs a Quality-Adaptive Unit alongside Dual-Space Reparameterization to generate these residuals. It restricts components designed for HQ inputs to the null-space, while channeling restoration components for LQ inputs into the non-null space. Since frozen weights nullify components within the null-space, NST successfully corrects degraded inputs without altering the pre-trained knowledge associated with HQ inputs. Comprehensive experiments demonstrate that NST surpasses current state-of-the-art methods on our Mixed-Quality benchmark.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





