arXiv

Deep networks learn to parse uniform-depth context-free languages from local statistics

Title: Deep Networks Acquire the Ability to Parse Uniform-Depth Context-Free Languages via Local Statistical Cues

Abstract: A pivotal inquiry in both machine learning and cognitive science involves determining how linguistic structure can be acquired from sentence data alone. While research into the internal representations of Large Language Models (LLMs) indicates that they can parse text during next-word prediction and capture semantic concepts distinct from surface forms, the specific data statistics that facilitate these capabilities and the necessary volume of training data remain poorly understood. Probabilistic context-free grammars (PCFGs) serve as a manageable experimental platform for investigating these issues. Previous studies have either analyzed the parsing-like algorithms employed by trained networks after the fact or examined the learnability of PCFGs with static syntax, a scenario where parsing is not required. This study addresses these gaps by (i) presenting a flexible class of PCFGs that allows for the manipulation of ambiguity levels and cross-scale correlation structures; (ii) introducing a learning mechanism—an inference algorithm modeled after deep convolutional network architectures—that connects learnability and sample complexity to distinct language statistics; and (iii) empirically confirming these predictions using both transformer-based and deep convolutional architectures. We propose a comprehensive framework suggesting that correlations across various scales resolve local ambiguities, thereby fostering the development of hierarchical data representations.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...