ReConFuse: Reconstruction-Error Guided Semantic Fusion for AI-Generated Video Detection
Title: ReConFuse: Leveraging Reconstruction Error for Semantic Fusion in AI Video Detection
Abstract:
As AI-generated videos grow increasingly indistinguishable from real footage, they pose significant risks to media integrity, content authenticity, and public trust. Consequently, developing robust methods for detecting AI-generated video is critical for multimedia forensics. However, this task is difficult because detectors must account for spatial artifacts, temporal dynamics, and adapt to rapidly evolving generative models. This study investigates reconstruction error as a powerful forensic signal for identifying AI-generated content. We utilize a pretrained WF-VAE to reconstruct input videos, revealing that real and synthetic videos display distinct frame-by-frame reconstruction error patterns. These discrepancies suggest that reconstruction errors can effectively highlight differences in data distribution. Nevertheless, adapting image-based detection methods to video is complex, as video reconstruction errors are temporally structured and require semantic context for accurate analysis. To overcome these hurdles, we introduce ReConFuse, a framework that guides semantic fusion through reconstruction errors for video-level detection. ReConFuse captures reconstruction error cues from WF-VAE outputs, aligns them with multi-frame semantic features, and employs a Mamba-based module to capture temporal evolution for final classification. Our experiments, conducted across various generative models and evaluation scenarios, confirm ReConFuse’s effectiveness and its strong generalization capabilities.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




