ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation
Title: ClustRecNet: An Innovative End-to-End Deep Learning Framework for Recommending Clustering Algorithms
Abstract: Selecting the most appropriate clustering algorithm for a specific dataset continues to be a core challenge in unsupervised learning. To address this, we present ClustRecNet, a new end-to-end deep learning system designed to recommend optimal clustering methods by directly extracting high-order representations from raw tabular data. To support robust meta-learning, we built a comprehensive repository containing 34,000 synthetic datasets that cover a wide spectrum of clustering scenarios. We applied ten widely used clustering algorithms to these datasets and utilized the Adjusted Rand Index (ARI) to generate ground-truth labels. The architecture of ClustRecNet integrates a convolution block, two residual blocks, and an attention mechanism to identify both local and global structural patterns, thereby circumventing the limitations of manual feature engineering. Comprehensive tests on both synthetic and real-world benchmarks reveal that ClustRecNet consistently surpasses traditional internal cluster validity indices—including Silhouette, Calinski-Harabasz, Davies-Bouldin, and Dunn—as well as leading Automated Machine Learning (AutoML) solutions like ML2DAC, AutoCluster, and AutoML4Clust. Specifically, the framework secured an average ARI improvement of 0.497 over the Calinski-Harabasz index on synthetic data and an average ARI gain of 44.16% against the top-performing AutoML tool, ML2DAC, on real-world benchmarks. The source code and data are accessible at: https://github.com/mrbakhtyari/ClustRecNet
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




