Context-aware child-directed speech detection from long-form recordings
Title: Detecting Child-Directed Speech in Extended Recordings Through Contextual Awareness
Abstract:
The ability to automatically differentiate between speech directed at children and that directed at adults within lengthy audio recordings is essential for conducting scalable analyses of children’s linguistic environments. Current methods typically analyze individual utterances in isolation and have largely been tested on English-only data. This study bridges these gaps by addressing three key areas. First, we fine-tune and assess six self-supervised models using a multilingual dataset comprising recordings from 182 children. Our results indicate that pre-training on child-centered recordings significantly outperforms models trained on adult speech. Second, we show that including surrounding contextual information markedly enhances classification accuracy, yielding an absolute improvement of 13.8% in the average F1-score. Third, we evaluate the model within a practical end-to-end workflow, ranging from detecting adult speech to classifying the addressee. Although performance declines when relying on automatic segmentation, the proposed model still consistently surpasses a rule-based baseline.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





