Explainable Forensics of Manipulated Segments in Untrimmed Long Videos
Explainable Forensics of Manipulated Segments in Untrimmed Long Videos
Abstract
While AI-driven video generation has revolutionized content creation, it has also exacerbated the threat of misinformation by enabling the insertion of localized manipulations into lengthy videos. Current forensic techniques primarily analyze short, isolated clips, rendering them ineffective for realistic scenarios where synthetic content is sparsely hidden within genuine footage. To address this limitation, we define the task of Temporal AI-Generated Segment Localization and Explanation (TASLE), which focuses on detecting authenticity, pinpointing the temporal location, and providing interpretable analysis of manipulated portions within untrimmed long videos. We present TASLE, a comprehensive benchmark featuring 12,472 untrimmed videos characterized by varied manipulation patterns and detailed annotations, such as temporal boundaries, authenticity labels, and segment-level rationales. Furthermore, we introduce MSLoc, a coarse-to-fine forensic baseline that integrates a boundary-sensitive proposal generation module for efficient scanning of long videos with an MLLM-based refinement module to achieve precise boundary localization and interpretable reasoning. Experimental results confirm the efficacy of this baseline, underscoring the critical role of segment-level explainable forensics in analyzing long-form AI-generated videos. The dataset and code are publicly accessible at https://debby-0527.github.io/TASLE.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





