arXiv

Learning Concepts, Not Tokens: Self-Supervised Semantic Alignment for Language Models

June 3, 2026 · Christine Zhang, Dan Jurafsky, Chen Shani · Original Source

Title: Prioritizing Concepts Over Tokens: Self-Supervised Semantic Alignment for Language Models

Abstract:

Traditional next-token prediction (NTP) objectives constrain language models to forecast a single specific token at every step, despite the fact that multiple distinct continuations can convey identical meanings. For instance, within the phrase "this sticker can be placed here," words such as "positioned," "attached," or "put" serve as valid, semantically equivalent alternatives. Conventional NTP training typically regards these interchangeable options as mutually exclusive targets. In contrast, we investigate a self-supervised approach that guides models to predict underlying concepts, which are approximated as collections of semantically equivalent tokens. By employing this concept-based supervision, models demonstrate enhanced alignment with human similarity assessments, alongside improvements in classification, clustering, and reranking tasks. Furthermore, they deliver reasoning capabilities that are on par with, or superior to, existing methods. These benefits are accompanied by reduced perplexity for semantically significant terms (see Section 3.2) and negligible rises in overall perplexity, indicating that conceptual frameworks boost semantic alignment without compromising general language modeling proficiency. Our source code is accessible at https://anonymous.4open.science/r/learning-concepts-9025 .

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC