arXiv

DSL-Topic: Improving Topic Modeling by Distilling Soft Labelsfrom Language Models

Title: Enhancing Topic Modeling Through Language Model Distillation of Soft Labels

Abstract: Conventional neural topic models generally rely on optimizing the reconstruction of Bag-of-Words (BoW) representations, a process that frequently neglects contextual nuances and faces challenges related to data sparsity. To address these limitations, this study presents a new training framework for topic models known as Distilling Soft Labels (DSL) from Language Models (LMs). By projecting next-token probabilities—conditioned on a specific prompt—onto a predefined vocabulary, the method generates contextually rich reconstruction signals. The topic models are then trained to reconstruct these soft labels using hidden states derived from the LM. This approach yields superior topics that better reflect the corpus’s thematic architecture. Comprehensive experiments reveal that DSL significantly enhances both topic coherence and assignment accuracy compared to current baseline methods. Furthermore, we propose a retrieval-based evaluation metric, which indicates that our method substantially surpasses existing techniques in locating semantically related documents, thereby underscoring its value for applications focused on retrieval.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...