arXiv

LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

June 3, 2026 · Minh Chu Xuan, Tien-Phat Nguyen, Linh Ngo Van, Dinh Viet Sang, Nguyen Thi Ngoc Diep, Trung Le · Original Source

Title: LLM-XTM: Boosting Cross-Lingual Topic Models via Large Language Models

Abstract: The objective of cross-lingual topic modeling is to identify common semantic patterns across different languages. However, current methodologies typically rely on limited bilingual resources, frequently resulting in topics that are either incoherent or poorly aligned. While recent advancements leveraging Large Language Models (LLMs) have enhanced interpretability, they suffer from high computational costs, operate only at the document level, and are susceptible to hallucinations. Furthermore, earlier white-box methods necessitated access to token probabilities, which is often unavailable. To address these challenges, we introduce LLM-XTM, a novel framework that combines LLM-guided topic refinement with self-consistency uncertainty quantification. This approach facilitates a black-box, stable, and scalable improvement to cross-lingual topic models. Our experiments on multilingual datasets demonstrate that LLM-XTM delivers better topic coherence and alignment, all while minimizing dependency on bilingual dictionaries and reducing the frequency of costly LLM queries.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

TechCrunch