Child-directed speech facilitates production, not comprehension, in BabyLMs
Title: Child-directed speech enhances production capabilities rather than comprehension in BabyLMs
Abstract:
While recent research indicates that child-directed speech (CDS) may hinder language learning in BabyLMs, existing assessments have largely prioritized comprehension over production. This oversight is significant because usage-based theories of language acquisition emphasize production, positing that CDS supports early language development by providing constructional "frames"—recurring lexical patterns containing open slots. To address this gap, we present a new generation-based evaluation framework inspired by these theories, specifically a frame-completion task. We benchmarked Llama models trained on three distinct datasets: CDS, the BabyLM corpus, and web-crawled data from FineWeb-edu, evaluating their performance on standard comprehension benchmarks alongside our novel framework. Our analysis uncovers a distinct divergence between comprehension and production skills: although models trained on FineWeb perform better on minimal pair tasks, those trained on CDS generate grammatically correct completions significantly earlier in the training process and assign higher probability to suitable slot-fillers. Consequently, these results demonstrate that standard comprehension metrics fail to capture the full benefits that CDS provides to BabyLMs.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





