Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors
Title: Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors
Abstract:
Semi-supervised hierarchical clustering seeks to construct a tree architecture that aligns with inherent data patterns while adhering to user-supervised signals. Typically, this supervision is provided at the leaf level, taking the form of pairwise must-link/cannot-link constraints or triplet-wise must-link-before requirements. While effective for managing local relationships between individual samples, such granular supervision fails to explicitly signal which samples should aggregate into coherent subtrees. As a result, the non-leaf hierarchy of the resulting tree may diverge from the structural organization indicated by ground-truth labels.
To overcome this constraint, we introduce a semi-supervised hyperbolic hierarchical clustering approach that leverages set-level structural priors. The core innovation lies in treating sets as the fundamental units for hierarchy learning. Each set comprises samples anticipated to maintain cohesion within a specific subtree. These sets are derived from leaf-level supervision alongside a learned similarity structure that respects existing constraints. By serving as soft structural priors for subtree-level guidance, these sets enable supervision to influence the formation of non-leaf hierarchy, extending its impact beyond immediate leaf-level relations.
Our methodology proceeds in three stages: first, we learn embeddings consistent with constraints to achieve a reliable partitioning of samples; second, we construct constraint-induced sets and calculate inter-set similarities to establish set-level structural priors; and third, these priors are integrated into a hyperbolic hierarchy objective to facilitate continuous tree optimization. Evaluation across eleven benchmark datasets, complemented by ablation studies, demonstrates that our method consistently outperforms representative hierarchical clustering baselines in terms of label consistency, while also delivering superior similarity-based tree quality.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





