ERICA: Quantifying Replicability of Cluster Analysis
Title: ERICA: Measuring the Reproducibility of Cluster Analysis
Clustering is a staple in scientific research, yet its outcomes often lack a systematic, quantitative framework for rigorous evaluation. To address this gap, we introduce ERICA (Evaluating Replicability via Iterative Clustering Assignments), a method designed to assess whether identified clusters can be consistently reproduced across analyses. The core of the pipeline involves calculating a specific statistic that indicates the presence of underlying structure within a dataset.
In addition to statistical computation, the study provides quantitative visualization techniques to address critical analytical questions, such as determining the degree of similarity between clusters and identifying potential outlier points. Empirical testing on synthetic datasets demonstrated that ERICA successfully detects clusters in a replicable fashion. However, when applied to three gene expression datasets aimed at validating breast cancer subtypes, the results highlighted instances where replicability was compromised. This research emphasizes the importance of thorough inspection in data analysis and presents ERICA as a practical instrument for ensuring robustness in clustering outcomes.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





