BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration
Title: BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration
Abstract: In emerging markets, e-commerce platforms frequently struggle with product catalogs that rely solely on broad category taxonomies, lacking the structured attribute schemas necessary for advanced functionality. This deficiency in fine-grained product details hampers search performance by inhibiting faceted filtering, impairing query comprehension, and diminishing the semantic quality of search system representations. To address this, we introduce BEATS, a human-in-the-loop framework leveraging Large Language Models (LLMs) to construct product attribute taxonomies from the ground up.
Our methodology enhances a multi-stage LLM generation pipeline with two essential production phases: (1) proactive quality control executed by model developers to identify and remove erroneous outputs, and (2) validation of generated attributes by local domain-expert staff through human annotation. The system functions iteratively, refining prompts at each generation stage based on insights from quality checks and annotator feedback across multiple rounds. This continuous feedback loop progressively elevates the quality of the attributes. Following the establishment of the taxonomy, LLMs are utilized to assign structured attributes to individual products, thereby enriching their contextual data.
This enriched catalog enhances various search system components by enabling detailed attribute-based filtering, supplying structured features for ranking algorithms, and strengthening semantic representations for dense retrieval. We validated the taxonomy’s efficacy by training dense retrieval models on the attribute-enriched data, achieving consistent performance gains over baselines that utilized the original catalog information. The system is currently operational at Rakuten Taiwan, where it has enriched 9 primary categories covering 2,694 sub-categories with 67,277 generated attributes. Furthermore, more than 5.4 million products have been tagged with these attributes, with future plans to expand this enrichment to the entire product catalog.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


