Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
Title: Investigating Minimalist Phase Architecture in Large Language Models: The Limitations of Universal Dependencies
Abstract:
Structural probes rely on Universal Dependencies (UD), a framework that fails to capture formal-syntactic abstractions such as phase boundaries or the internal cohesion within phases. Consequently, whether large language models (LLMs) encode these concepts remains an unresolved question, one that UD-based probing methods are structurally incapable of addressing. To investigate this, we applied structural probes to wh-movement stimuli, specifically selecting cases where UD distances remain constant across different conditions by design. Any observed non-zero effect in these scenarios must therefore stem from syntactic structures that lie beyond the scope of UD.
Our study examined three conditions—bare small clauses, infinitivals, and finite clauses—which are ranked according to the number of Minimalist Program (MP) phase boundaries traversed by the wh-element. In an analysis spanning 13 LLMs from four distinct families, we identified a phase-count gradient in cross-clause pairs, a pattern present in 12 out of 13 models. Additionally, we observed a sign asymmetry in within-clause pairs in all 13 models. Notably, the UD distance for these within-clause pairs is identical across all conditions, yet the asymmetry persists. This specific phenomenon aligns with the MP abstraction of phase-internal cohesion, a structural feature inherently invisible to UD.
Furthermore, activation patching techniques confirmed that these representations are causally active in 12 of the 13 models. These results indicate that distributional pretraining has the capacity to induce representations that correspond to formal-syntactic abstractions exceeding the reach of annotation-based probing. Ultimately, UD-grounded probes serve as a lower bound for syntactic encoding rather than an upper limit.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





