arXiv

BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration

Title: BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration

Abstract: In emerging markets, e-commerce platforms frequently struggle with product catalogs that rely solely on broad category taxonomies, lacking the structured attribute schemas necessary for advanced functionality. This deficiency in fine-grained product details hampers search performance by inhibiting faceted filtering, impairing query comprehension, and diminishing the semantic quality of search system representations. To address this, we introduce BEATS, a human-in-the-loop framework leveraging Large Language Models (LLMs) to construct product attribute taxonomies from the ground up.

Our methodology enhances a multi-stage LLM generation pipeline with two essential production phases: (1) proactive quality control executed by model developers to identify and remove erroneous outputs, and (2) validation of generated attributes by local domain-expert staff through human annotation. The system functions iteratively, refining prompts at each generation stage based on insights from quality checks and annotator feedback across multiple rounds. This continuous feedback loop progressively elevates the quality of the attributes. Following the establishment of the taxonomy, LLMs are utilized to assign structured attributes to individual products, thereby enriching their contextual data.

This enriched catalog enhances various search system components by enabling detailed attribute-based filtering, supplying structured features for ranking algorithms, and strengthening semantic representations for dense retrieval. We validated the taxonomy’s efficacy by training dense retrieval models on the attribute-enriched data, achieving consistent performance gains over baselines that utilized the original catalog information. The system is currently operational at Rakuten Taiwan, where it has enriched 9 primary categories covering 2,694 sub-categories with 67,277 generated attributes. Furthermore, more than 5.4 million products have been tagged with these attributes, with future plans to expand this enrichment to the entire product catalog.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...

What Are A.I. Agents Actually Doing?
New York Times

What Are A.I. Agents Actually Doing?

Arena research shows tech professionals are most likely to use AI agents at work, highlighting a strong industry trend i...