arXiv

Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors

Title: Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors

Abstract:

Semi-supervised hierarchical clustering seeks to construct a tree architecture that aligns with inherent data patterns while adhering to user-supervised signals. Typically, this supervision is provided at the leaf level, taking the form of pairwise must-link/cannot-link constraints or triplet-wise must-link-before requirements. While effective for managing local relationships between individual samples, such granular supervision fails to explicitly signal which samples should aggregate into coherent subtrees. As a result, the non-leaf hierarchy of the resulting tree may diverge from the structural organization indicated by ground-truth labels.

To overcome this constraint, we introduce a semi-supervised hyperbolic hierarchical clustering approach that leverages set-level structural priors. The core innovation lies in treating sets as the fundamental units for hierarchy learning. Each set comprises samples anticipated to maintain cohesion within a specific subtree. These sets are derived from leaf-level supervision alongside a learned similarity structure that respects existing constraints. By serving as soft structural priors for subtree-level guidance, these sets enable supervision to influence the formation of non-leaf hierarchy, extending its impact beyond immediate leaf-level relations.

Our methodology proceeds in three stages: first, we learn embeddings consistent with constraints to achieve a reliable partitioning of samples; second, we construct constraint-induced sets and calculate inter-set similarities to establish set-level structural priors; and third, these priors are integrated into a hyperbolic hierarchy objective to facilitate continuous tree optimization. Evaluation across eleven benchmark datasets, complemented by ablation studies, demonstrates that our method consistently outperforms representative hierarchical clustering baselines in terms of label consistency, while also delivering superior similarity-based tree quality.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...