The Mechanistic Emergence of Symbol Grounding in Language Models
Title: The Mechanistic Emergence of Symbol Grounding in Language Models
Abstract
Symbol grounding, a concept introduced by Harnad (1990), explains the process by which linguistic symbols, such as words, derive their significance through connections to tangible sensorimotor experiences in the real world. While recent studies have provided initial indications that such grounding capabilities may spontaneously arise in large-scale vision-language models—even in the absence of explicit grounding objectives—the precise locations and underlying mechanisms driving this emergence have remained largely uninvestigated. To bridge this gap, we present a controlled evaluation framework designed to systematically map the development of symbol grounding within internal model computations using mechanistic and causal analysis.
Our investigation reveals that grounding processes are primarily concentrated in the middle layers of the network. This phenomenon is facilitated by an aggregate mechanism, in which attention heads synthesize environmental grounding information to aid in the prediction of linguistic structures. We observe that this behavior is consistent across multimodal dialogue scenarios and various model architectures, including state-space models and Transformers; however, it is notably absent in unidirectional LSTMs. These findings offer both behavioral and mechanistic proof that symbol grounding can emerge within language models. Furthermore, these insights carry significant practical implications, particularly for forecasting and potentially managing the reliability of generated outputs.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






