arXiv

Deep networks learn to parse uniform-depth context-free languages from local statistics

June 2, 2026 · Jack T. Parley, Francesco Cagnetta, Matthieu Wyart · Original Source

Title: Deep Networks Acquire the Ability to Parse Uniform-Depth Context-Free Languages via Local Statistical Cues

Abstract: A pivotal inquiry in both machine learning and cognitive science involves determining how linguistic structure can be acquired from sentence data alone. While research into the internal representations of Large Language Models (LLMs) indicates that they can parse text during next-word prediction and capture semantic concepts distinct from surface forms, the specific data statistics that facilitate these capabilities and the necessary volume of training data remain poorly understood. Probabilistic context-free grammars (PCFGs) serve as a manageable experimental platform for investigating these issues. Previous studies have either analyzed the parsing-like algorithms employed by trained networks after the fact or examined the learnability of PCFGs with static syntax, a scenario where parsing is not required. This study addresses these gaps by (i) presenting a flexible class of PCFGs that allows for the manipulation of ambiguity levels and cross-scale correlation structures; (ii) introducing a learning mechanism—an inference algorithm modeled after deep convolutional network architectures—that connects learnability and sample complexity to distinct language statistics; and (iii) empirically confirming these predictions using both transformer-based and deep convolutional architectures. We propose a comprehensive framework suggesting that correlations across various scales resolve local ambiguities, thereby fostering the development of hierarchical data representations.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC