arXiv

Rethinking the Idiomaticity Decomposability Hypothesis: Evidence from Distributional Learning

June 3, 2026 · Maggie Mi, Golzar Atefi, Atsuki Yamaguchi, Felix Gers, Aline Villavicencio, Nafise Sadat Moosavi · Original Source

Title: Re-evaluating the Decomposability of Idioms: Insights from Distributional Learning

Abstract:

Idioms are often categorized by their decomposability, which measures how much the individual meanings of their parts contribute to the overall figurative sense. While it has long been hypothesized that decomposability predicts an idiom’s syntactic flexibility, usage-based theories argue that such behavior is better explained by distributional experience, including factors like speaker familiarity and predictability. To investigate these competing perspectives, we utilized contextualized language models as controlled distributional learners. We introduced an internal model-based metric for decomposability and analyzed its relationship with human ratings, syntactic flexibility, and predictability, while also monitoring how idiom representations evolved during the pretraining phase. Our findings indicate that model-derived decomposability has only a weak correlation with human judgments and exhibits a modest yet consistent negative association with syntactic flexibility. Furthermore, analysis of the pretraining process reveals that the stabilization of idiom representations cannot be attributed to frequency alone. Rather, surprisal, decomposability, and frequency all play a role, with decomposability demonstrating the most significant effect dependent on the training process.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC