The DeepSpeak-Agentic Dataset
Title: Introducing the DeepSpeak-Agentic Dataset
Abstract
This paper introduces DeepSpeak-Agentic, a comprehensive video dataset containing more than 37 hours of semi-structured dialogues between human participants and embodied AI agents. We utilize this resource to assess the automatic forensic identification of AI agents across audio, video, and text modalities, investigate the dynamics of human-agent interaction, and establish a benchmark to drive progress in large-language models as well as the AI-generated voices and faces that enable embodied agents. Additionally, we present a scalable data-capture infrastructure designed to generate agents, automatically match them with human crowd workers, record audiovisual exchanges within defined scenarios, and effectively isolate the human and agent signals from the combined feed.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



