arXiv

Prototype Selection Using Topological Data Analysis

Title: Leveraging Topological Data Analysis for Prototype Selection

Abstract:

While prototype selection techniques are designed to reduce the size of training datasets, current classification schemes—categorized into condensation, edition, hybrid, competence-based, optimization-based, and clustering-based groups—fail to account for approaches that leverage the multi-scale topological characteristics of data. To address this gap, this study presents two novel variants of persistence-based prototype selectors: the Topological Prototype Selector (TPS) and the Boundary-Conscious Topological Prototype Selector (BoundaryTPS).

The TPS approach employs two successive Rips filtrations to identify and preserve points that are either representative of the interior or critical to the boundaries. In contrast, BoundaryTPS operates as a single-stage method, utilizing a vertex-weighted filtration process that prioritizes retention in the vicinity of decision boundaries.

We assessed the performance of these two methods against seven established baselines using fifteen real-world datasets. Our findings indicate that these topological approaches occupy a distinct region within the prototype-selection design space compared to existing techniques. Specifically, BoundaryTPS secured the lowest mean Friedman rank for preserving $H_1$ persistence diagrams, demonstrating statistically significant superiority over five of the seven baselines (Nemenyi test, $\alpha = 0.05$). TPS placed third on this same metric.

Furthermore, both methods exhibited greater stability under fold perturbation than any of the tested chained-decision selectors. They also naturally maintain the class proportions of the original dataset without requiring additional label-aware mechanisms. Regarding aggregate G-Mean performance, the methods remained competitive, though they did not lead; TPS achieved a rank-1 frequency of 11.3%, while BoundaryTPS reached 9.9% across various fold combinations. Finally, empirical analysis confirms that both methods scale sub-quadratically with respect to sample size.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...