Multilinguality of Large Language Models From a Structural Perspective
Title: A Structural Analysis of Multilinguality in Large Language Models
Abstract: Despite English dominating training datasets, large language models (LLMs) have demonstrated remarkable proficiency in handling multiple languages via pre- and post-training on multilingual corpora. While previous research centered on token representations has offered insights into how LLMs process non-English text, these studies have overlooked a structural perspective, which is fundamental to the nature of language. This paper investigates LLM multilinguality through the lens of representational structural analysis. Our results indicate that low-resource languages exhibit greater structural divergence from English compared to high- and mid-resource languages. Furthermore, we find that language-specific post-training modifies these structures without disrupting the underlying inter-language relationships.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




