Semantic Retrieval for Product Search in E-Commerce
Title: Semantic Retrieval for Product Search in E-Commerce
Abstract:
E-commerce semantic retrieval faces the distinct challenge of processing brief, noisy, and colloquial user queries against extensive product catalogs that feature subtle attribute differences. To address this, we introduce a Siamese LLM dual-encoder architecture optimized via a two-stage training pipeline. The first stage employs contrastive learning augmented by a false-negative margin mask, a technique designed to avoid penalizing models for near-duplicate product pairs. The second stage utilizes Relative Odds Alignment for Retrieval (ROAR), a preference optimization method that generalizes the Bradley-Terry model to handle variable-sized groups of graded relevance through consecutive odds-ratio margins.
The training data follows this two-phase structure: Stage 1 relies on substitute query-product pairs to provide coarse semantic supervision, while Stage 2 leverages graded relevance annotations to refine fine-grained ranking. Our resulting system successfully retrieves exact matches and appropriately orders both substitute and complementary items. These improvements have been validated across different query frequency levels and business sectors, with statistical significance confirmed through large-scale live A/B testing.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





