arXiv

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Title: Bridging Information-Theoretic and Pair-Counting Clustering Similarity

Abstract:

Evaluating unsupervised models fundamentally relies on comparing clusterings, yet the proliferation of existing similarity metrics often yields conflicting or widely divergent results. These measures generally fall into two primary categories: pair-counting and information-theoretic. The former assesses agreement by examining individual element pairs, while the latter aggregates data across entire cluster contingency tables. Although previous studies have noted similarities between these families and have employed empirical normalization or chance-correction techniques, the underlying analytical link between them has remained incompletely resolved.

This study introduces an analytical framework that unifies these two families through two distinct but complementary lenses. First, we demonstrate that both families can be formulated as weighted expansions contrasting observed co-occurrences with expected ones. Within this structure, pair-counting metrics represent a quadratic, low-order approximation, whereas information-theoretic measures serve as higher-order extensions that incorporate frequency weighting. Second, we extend the concept of pair-counting to k-tuple agreement, revealing that information-theoretic measures effectively accumulate higher-order co-assignment structures that extend beyond simple pairwise interactions.

We apply these perspectives analytically to the Rand index and Mutual Information, illustrating how other indices within each family arise as logical extensions. Collectively, these insights elucidate the specific conditions and reasons for divergence between the two regimes, linking their differing sensitivities directly to weighting schemes and approximation orders. This framework offers a rigorous foundation for the selection, interpretation, and extension of clustering similarity measures across various applications.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

Fed's Daly Says Forward Guidance Could Be Misleading
Bloomberg

Fed's Daly Says Forward Guidance Could Be Misleading

Fed’s Daly warns forward guidance may be misleading or lack clarity.

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...