LLM-Assisted Reranking to Operationalize Nuanced Objectives in Recommender Systems
Title: Leveraging LLM-Assisted Reranking to Implement Complex Goals in Recommendation Engines
Abstract: Recommender systems have evolved from simple content-organization utilities into influential platforms that mold daily behavior. By curating the information users encounter, these systems influence perception, thereby raising significant concerns regarding filter bubbles, societal polarization, radicalization, and widening social inequality. The integration of Large Language Models (LLMs) enhances personalization capabilities, which further intensifies these effects. However, current recommender systems are primarily optimized for engagement or narrow accuracy metrics, largely neglecting broader social consequences, such as how personalized algorithms alter exposure in domains with significant societal impact.
This study examines whether LLM-assisted reranking, despite enhancing personalization, inadvertently increases user exposure to politically conspiratorial or ideologically extreme content—a risk that has been theorized but lacks empirical evidence in the context of news recommendation. Utilizing actual news-consumption logs, we applied zero-shot, instruction-based prompting to rerank YouTube’s sidebar video suggestions. We evaluated a standard baseline prompt against a constrained version designed to maintain topical relevance and broaden ideological diversity while suppressing conspiratorial or extreme material.
Our findings indicate that unconstrained reranking improved personalization but heightened the visibility of extremist and conspiratorial content for users with such histories. Conversely, applying lightweight, prompt-level regularization curbed the promotion of extreme material and fostered greater ideological diversity, incurring only a minor reduction in relevance. Synthetic experiments further reveal that LLMs perform reranking based on linguistic statistical patterns rather than a semantic comprehension of ideology. This insight explains why unregulated prompts tend to amplify existing biases and how regularization can effectively mitigate them. Collectively, these results demonstrate the capacity of LLMs to embed contextual nuance into high-stakes recommendations, while underscoring the necessity of assessing LLM-driven personalization beyond mere accuracy. They also argue for viewing prompt design as a value-driven choice rather than a neutral technical default.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



