CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction
Title: CTR-Sink: Leveraging Attention Sinks for Language Models in Click-Through Rate Prediction
Abstract:
Click-Through Rate (CTR) prediction serves as a fundamental component of recommendation systems, aiming to gauge the probability of user clicks based on past behavioral records. Recently, there has been growing interest in treating user behavior sequences as textual data to harness the powerful semantic comprehension and contextual modeling strengths of Language Models (LMs). Nevertheless, a significant structural discrepancy remains: unlike the coherent natural language used during LM pre-training, user behavior sequences are composed of discrete actions separated by semantically void delimiters. This incongruity leads to semantic fragmentation, causing the attention mechanisms within LMs to disperse across irrelevant tokens rather than concentrating on meaningful behavioral boundaries and the relationships between actions, which ultimately undermines prediction accuracy.
To overcome this challenge, we introduce $\textit{CTR-Sink}$, an innovative framework that integrates behavior-level attention sinks specifically designed for recommendation contexts. Drawing inspiration from attention sink theory, the approach establishes attention focus sinks and dynamically controls attention aggregation through external information. We achieve this by placing sink tokens between successive behaviors, embedding recommendation-specific cues such as temporal distance to function as stable attention anchors. Furthermore, to improve the framework's versatility, we developed a two-stage training protocol that explicitly directs LM attention toward these sink tokens. This is complemented by an attention sink mechanism that strengthens dependencies between sinks, thereby facilitating a more accurate capture of behavioral correlations. Our experimental results, conducted on one industrial dataset and two open-source benchmarks (MovieLens and Kuairec), along with visual analyses, confirm the efficacy of our method across diverse scenarios.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





