Agentic Clustering: Controllable Text Taxonomies via Multi-Agent Refinement
Title: Agentic Clustering: Controllable Text Taxonomies via Multi-Agent Refinement
Abstract:
Current approaches to text clustering typically rely on large language models (LLMs) to generate a cluster taxonomy from a dataset, followed by the assignment of individual texts to those groups. However, these workflows are inherently rigid; the logic governing LLM interactions, as well the rules for terminating, combining, or dividing clusters, are hardcoded beforehand. This static nature limits their ability to generalize across datasets with varying structures and hinders the integration of specific user requirements, such as a desired number of clusters or particular clustering objectives.
To address these limitations, we introduce an agentic framework where a central orchestrator LLM monitors the progress of the discovery process at every stage. Instead of following a predetermined script, the orchestrator directs one of several specialized agents—specifically the proposer, synthesizer, auditor, investigator, and critic—allowing the pipeline to dynamically adapt to the unique characteristics of the corpus. Our evaluation across seven public text-clustering benchmarks demonstrates that this method achieves state-of-the-art results, outperforming the leading previous LLM-based baseline by as much as 32% in Adjusted Rand Index (ARI).
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





