arXiv

CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA

Title: CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA

Abstract:

This study investigates timestamped question answering within educational lecture videos, operating under strict single-GPU constraints for both memory and latency. The proposed system processes natural-language queries by retrieving relevant, timestamped video segments and generating grounded responses. We introduce CourseTimeQA, a dataset comprising 902 queries across six courses totaling 52.3 hours of content, alongside CrossFusion-RAG, a lightweight retriever designed for latency constraints. This cross-modal architecture integrates frozen encoders, a learned projection layer mapping 512 to 768 dimensions for visual data, and a shallow, query-agnostic cross-attention mechanism applied to ASR text and video frames. It also incorporates a temporal-consistency regularizer and a compact cross-attentive reranker.

Experimental results on CourseTimeQA demonstrate that CrossFusion-RAG outperforms a robust BLIP-2 baseline, yielding improvements of 0.08 in Mean Reciprocal Rank (MRR) and 0.10 in nDCG@10. Notably, the model maintains a median end-to-end latency of approximately 1.55 seconds on a single A100 GPU. We benchmark our approach against several closely related methods under identical hardware and indexing conditions, including zero-shot CLIP multi-frame pooling, a combination of CLIP with a cross-encoder reranker and MMR, learned late-fusion gating, text-only hybrid systems with cross-encoder reranking (and their MMR variants), caption-augmented text retrieval, and non-learned temporal smoothing. To facilitate reproducible research, we provide comprehensive training and tuning details, along with robustness analyses regarding ASR noise (categorized by WER quartiles) and diagnostics for temporal localization.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...