Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt
Title: Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt
Abstract:
Usage-based theories of grammar suggest that the creative potential of language structures is shaped by two opposing frequency signals: entrenchment, which arises from frequent usage, and preemption, which occurs when a specific linguistic structure is never observed in a context where it would be expected. Since Large Language Models (LLMs) are also usage-based—having acquired language structures through exposure to massive corpora of text—we investigated whether these same statistical forces of entrenchment and preemption similarly encourage and constrain linguistic productivity in LLMs.
Our findings, consistent across various model architectures, indicate that larger models can identify and reproduce constructional productivity when working with nonce words. Specifically, they exhibit entrenchment through coercion, a process where the broader constructional context forces an atypical interpretation of a lexical item. However, we also demonstrate that even the most advanced models fail to apply negative evidence to novel language. Consequently, statistical preemption does not allow models to prevent the overgeneralization of patterns that are semantically appropriate, despite never appearing in the training data.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





