arXiv

DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

Title: DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

Abstract:

Although continuous embeddings have driven significant advancements in e-commerce search relevance, a persistent challenge remains: the inability to effectively capture fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) offer a compelling alternative, current generation techniques predominantly depend on unsupervised quantization. In real-world applications, the absence of explicit supervision complicates the determination of which items should share an SID, thereby restricting the model’s capacity for query-dependent ranking.

To overcome the limitations of unsupervised SIDs, we introduce the Discrete Semantic Identifier Relevance Model (DSIRM), which explicitly models discrete relevance features. Our approach employs a query-bridged contrastive quantization method on the item side, incorporating query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions. Simultaneously, we leverage generative Large Language Models (LLMs) on the query side to explicitly predict item SIDs from textual input, effectively addressing tail queries and intent ambiguity. The hierarchical prefix matching of query and item SIDs generates discriminative features that serve as a perfect complement to dense signals.

Extensive experiments conducted on Tmall’s production data demonstrate the superiority of our proposed method, yielding an offline AUC improvement of +1.54%. Following deployment through an efficient hybrid architecture, the system achieved notable online performance gains, including a +0.13% increase in UCTR and a +0.25% rise in UCTCVR, underscoring its substantial industrial value.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...

TechCrunch

Benchmark raises its first-ever growth fund as part of $2B capital raise

Benchmark Capital launches its first growth fund, raising $2 billion to target later-stage AI deals. This marks a strate...

Netflix Aims to Use AI to Help Viewers Manage Content Overload
Bloomberg

Netflix Aims to Use AI to Help Viewers Manage Content Overload

Netflix uses AI to help viewers manage content overload, tackling the challenge of too many choices.

TSMC CEO Warns Chip Supply Won’t Meet AI-Fueled Demand for Years
Bloomberg

TSMC CEO Warns Chip Supply Won’t Meet AI-Fueled Demand for Years

TSMC CEO warns that chip supply will lag behind surging AI demand for years. This multi-year shortfall highlights the in...

Reuters

TSMC boss upbeat on outlook as AI boom shows no sign of easing

TSMC executives remain optimistic as sustained AI demand shows no signs of slowing, driving strong confidence in the com...

Bitcoin Falls to Pre-Iran Conflict Low as Crypto Slide Extends
Bloomberg

Bitcoin Falls to Pre-Iran Conflict Low as Crypto Slide Extends

Bitcoin drops to its lowest level before the Iran conflict, extending a broader cryptocurrency decline.