On the Theoretical Limitations of Embedding-based Link Prediction
Title: Theoretical Constraints of Embedding-Based Link Prediction
Abstract: Neural architectures frequently project low-dimensional representations into expansive high-dimensional output spaces. In many cases, the final layer employs a linear transformation, thereby introducing a "rank bottleneck" that restricts the functional expressivity of the model. This limitation is particularly prevalent in link prediction frameworks, including knowledge graph embeddings (KGEs), where the entity output space often exceeds the embedding dimension by several orders of magnitude. This study examines how such rank bottlenecks constrain a model’s ability to fit training data. Unlike prior research that established sufficient bounds on embedding dimensions for specific KGEs, we derive necessary bounds applicable to all KGEs utilizing linear output layers, demonstrating that these limits increase with both graph size and connectivity. To circumvent this bottleneck without incurring substantial parameter overhead, we explore the use of non-linear output layers based on mixtures. Our empirical results confirm that models incorporating this non-linear approach achieve superior ranking performance and probabilistic fit on large, dense datasets, aligning with our theoretical predictions. Ultimately, this work highlights the restrictive nature of linear output layers in KGEs and advocates for non-linear alternatives to facilitate scaling to complex, large-scale graphs.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




