arXiv

Taiji: Pareto Optimal Policy Optimization with Semantics-IDs Trade-off for Industrial LLM-Enhanced Recommendation

Title: Taiji: Achieving Pareto Optimality in Policy Optimization via Semantic-ID Trade-offs for Industrial LLM-Enhanced Recommendation

The integration of large language models (LLMs) into recommender systems has emerged as a dominant trend within the industry. Nevertheless, aligning the semantic space of LLMs with the identifier (ID) space of recommenders through post-training techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), continues to present significant hurdles. Current LLM4Rec approaches are primarily constrained by two critical issues: first, the challenge of quantifying and enhancing the quality of chain-of-thought (CoT) reasoning during SFT in open-domain recommendation contexts; and second, the failure to adequately balance the trade-off between LLM semantic rewards and recommendation preference rewards during RL alignment.

Addressing these obstacles, we introduce Taiji, an innovative LLM-as-Enhancer framework tailored for industrial-scale recommender systems. To surmount the SFT bottleneck, our method employs reverse-engineered reasoning alongside open-ended rejection sampling to synthesize high-quality, domain-specific CoT data. To address the complexities of RL alignment, we propose Pareto Optimal Policy Optimization (POPO). This mechanism dynamically calibrates cross-domain reward weights, theoretically securing an optimal equilibrium between the LLM’s semantic world knowledge and the collaborative ID features that reflect real-time user preferences.

The efficacy of Taiji is substantiated by comprehensive offline evaluations and online A/B testing. Since its deployment on Kuaishou’s advertising platform in May 2026, Taiji has been processing requests for more than 400 million users on a daily basis. The system has delivered substantial commercial returns, confirming its robust scalability in web-scale operational environments.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...