Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures
Title: Leveraging Sum-of-Squares for Dimension Reduction and Enhanced Clustering in Non-Spherical Mixtures
Abstract: This study introduces a novel strategy for clustering Gaussian mixture models with non-spherical components, characterized by arbitrary covariances. Central to this approach is a subroutine grounded in the sum-of-squares method, which identifies a low-dimensional projection of the input data that preserves separation. This technique serves as a non-spherical counterpart to the classic dimension reduction method relying on singular value decomposition, a foundational element in the renowned spherical clustering algorithm proposed by Vempala and Wang [VW04].
We present two primary algorithmic contributions. First, we provide a method to cluster a mixture of $k$ centered (zero-mean) Gaussians with arbitrary covariances, provided there are $n \geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples, operating in $\operatorname{poly}(n)$ time. Second, we address the clustering of $k$ Gaussians sharing an identical but unknown arbitrary covariance, requiring $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. In both scenarios, $w_{\min}$ denotes the smallest mixing weight within the mixture, and $f$ represents a function independent of the dimension $d$. These algorithms are robust, accommodating a dimension-independent proportion of arbitrary outliers.
Prior to this research, the best-known techniques for non-spherical clustering demanded $d^{O(k)} f(w_{\min}^{-1})$ samples and computational time. These findings stand in contrast to the prevailing $d^{\Omega(k)}$ statistical query and sum-of-squares lower bounds established in [DKS17, DKPP24] for non-spherical Gaussian mixtures. While such lower bounds are generally interpreted as prohibiting algorithms with costs below $d^{o(k)}$, our work demonstrates that these constraints can be bypassed for a broad and significant class of Gaussian mixtures.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





