Prototype Selection Using Topological Data Analysis
Title: Leveraging Topological Data Analysis for Prototype Selection
Abstract:
While prototype selection techniques are designed to reduce the size of training datasets, current classification schemes—categorized into condensation, edition, hybrid, competence-based, optimization-based, and clustering-based groups—fail to account for approaches that leverage the multi-scale topological characteristics of data. To address this gap, this study presents two novel variants of persistence-based prototype selectors: the Topological Prototype Selector (TPS) and the Boundary-Conscious Topological Prototype Selector (BoundaryTPS).
The TPS approach employs two successive Rips filtrations to identify and preserve points that are either representative of the interior or critical to the boundaries. In contrast, BoundaryTPS operates as a single-stage method, utilizing a vertex-weighted filtration process that prioritizes retention in the vicinity of decision boundaries.
We assessed the performance of these two methods against seven established baselines using fifteen real-world datasets. Our findings indicate that these topological approaches occupy a distinct region within the prototype-selection design space compared to existing techniques. Specifically, BoundaryTPS secured the lowest mean Friedman rank for preserving $H_1$ persistence diagrams, demonstrating statistically significant superiority over five of the seven baselines (Nemenyi test, $\alpha = 0.05$). TPS placed third on this same metric.
Furthermore, both methods exhibited greater stability under fold perturbation than any of the tested chained-decision selectors. They also naturally maintain the class proportions of the original dataset without requiring additional label-aware mechanisms. Regarding aggregate G-Mean performance, the methods remained competitive, though they did not lead; TPS achieved a rank-1 frequency of 11.3%, while BoundaryTPS reached 9.9% across various fold combinations. Finally, empirical analysis confirms that both methods scale sub-quadratically with respect to sample size.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





