DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling
Title: DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling
Abstract:
Although continuous embeddings have driven significant advancements in e-commerce search relevance, a persistent challenge remains: the inability to effectively capture fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) offer a compelling alternative, current generation techniques predominantly depend on unsupervised quantization. In real-world applications, the absence of explicit supervision complicates the determination of which items should share an SID, thereby restricting the model’s capacity for query-dependent ranking.
To overcome the limitations of unsupervised SIDs, we introduce the Discrete Semantic Identifier Relevance Model (DSIRM), which explicitly models discrete relevance features. Our approach employs a query-bridged contrastive quantization method on the item side, incorporating query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions. Simultaneously, we leverage generative Large Language Models (LLMs) on the query side to explicitly predict item SIDs from textual input, effectively addressing tail queries and intent ambiguity. The hierarchical prefix matching of query and item SIDs generates discriminative features that serve as a perfect complement to dense signals.
Extensive experiments conducted on Tmall’s production data demonstrate the superiority of our proposed method, yielding an offline AUC improvement of +1.54%. Following deployment through an efficient hybrid architecture, the system achieved notable online performance gains, including a +0.13% increase in UCTR and a +0.25% rise in UCTCVR, underscoring its substantial industrial value.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC



