arXiv

CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction

Title: CTR-Sink: Leveraging Attention Sinks for Language Models in Click-Through Rate Prediction

Abstract:

Click-Through Rate (CTR) prediction serves as a fundamental component of recommendation systems, aiming to gauge the probability of user clicks based on past behavioral records. Recently, there has been growing interest in treating user behavior sequences as textual data to harness the powerful semantic comprehension and contextual modeling strengths of Language Models (LMs). Nevertheless, a significant structural discrepancy remains: unlike the coherent natural language used during LM pre-training, user behavior sequences are composed of discrete actions separated by semantically void delimiters. This incongruity leads to semantic fragmentation, causing the attention mechanisms within LMs to disperse across irrelevant tokens rather than concentrating on meaningful behavioral boundaries and the relationships between actions, which ultimately undermines prediction accuracy.

To overcome this challenge, we introduce $\textit{CTR-Sink}$, an innovative framework that integrates behavior-level attention sinks specifically designed for recommendation contexts. Drawing inspiration from attention sink theory, the approach establishes attention focus sinks and dynamically controls attention aggregation through external information. We achieve this by placing sink tokens between successive behaviors, embedding recommendation-specific cues such as temporal distance to function as stable attention anchors. Furthermore, to improve the framework's versatility, we developed a two-stage training protocol that explicitly directs LM attention toward these sink tokens. This is complemented by an attention sink mechanism that strengthens dependencies between sinks, thereby facilitating a more accurate capture of behavioral correlations. Our experimental results, conducted on one industrial dataset and two open-source benchmarks (MovieLens and Kuairec), along with visual analyses, confirm the efficacy of our method across diverse scenarios.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...