Learning Concepts, Not Tokens: Self-Supervised Semantic Alignment for Language Models
Title: Prioritizing Concepts Over Tokens: Self-Supervised Semantic Alignment for Language Models
Abstract:
Traditional next-token prediction (NTP) objectives constrain language models to forecast a single specific token at every step, despite the fact that multiple distinct continuations can convey identical meanings. For instance, within the phrase "this sticker can be placed here," words such as "positioned," "attached," or "put" serve as valid, semantically equivalent alternatives. Conventional NTP training typically regards these interchangeable options as mutually exclusive targets. In contrast, we investigate a self-supervised approach that guides models to predict underlying concepts, which are approximated as collections of semantically equivalent tokens. By employing this concept-based supervision, models demonstrate enhanced alignment with human similarity assessments, alongside improvements in classification, clustering, and reranking tasks. Furthermore, they deliver reasoning capabilities that are on par with, or superior to, existing methods. These benefits are accompanied by reduced perplexity for semantically significant terms (see Section 3.2) and negligible rises in overall perplexity, indicating that conceptual frameworks boost semantic alignment without compromising general language modeling proficiency. Our source code is accessible at https://anonymous.4open.science/r/learning-concepts-9025 .
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





