Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization
Title: Maximizing Efficiency: Sampling Strategies for Empirical Pairwise Loss Estimation and Minimization
Abstract: In numerous machine learning domains—such as clustering, ranking, and similarity learning—the reliance on empirical pairwise loss functions presents a significant scalability challenge due to their quadratic computational complexity. This study investigates how a resource-efficient strategy, which utilizes only a subset of the available pairwise data, can deliver estimation or optimization results on par with those derived from exhaustive pairwise analysis, by applying principles of survey sampling. Our research, validated through both theoretical analysis and experimental evidence, reveals a critical insight: effective sampling must prioritize the pairs themselves rather than individual data points. Specifically, when dealing with high-dimensional vectors common in graph learning or computer vision, utilizing auxiliary data to assign greater inclusion probabilities to more informative pairs allows for performance levels nearly indistinguishable from full pairwise evaluation. This approach establishes a robust, theoretically sound balance between computational efficiency and predictive accuracy.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





