Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics
Title: Predicting Scientific Breakthroughs via Explainable Analysis of Concept Network Dynamics
Abstract: This study presents a transparent machine-learning framework designed to anticipate the structural foundations of scientific breakthroughs, defined as the emergence and strengthening of connections between research concepts. By analyzing the temporal evolution of OpenAlex concept networks, we employ a two-stage LightGBM model that utilizes 59 semantic and topological features. This model simultaneously predicts whether a link will form between concept pairs and estimates the future weight of that connection, incorporating a regression phase to quantify expected intensity following initial existence forecasts.
Our method surpasses current standards by enhancing both predictive accuracy and interpretability. Validation across four distinct biomedical and technology sectors demonstrates ROC-AUC scores ranging from 0.954 to 0.967 across all time horizons, achieved without the need for model re-tuning. These results significantly outperform previous models, which typically hover around an AUC of 0.90. Furthermore, our approach ensures transparency by relying on structural, auditable features rather than black-box embeddings. The classification component achieves high performance (AUC approximately 0.95), while the regression component maintains stability, with an RMSLE between 0.45 and 0.6 over one-to-five-year periods.
Feature attribution analysis reveals that structural metrics, specifically Adamic-Adar similarity and degree-based Hadamard measures, are the primary drivers of accuracy. This suggests that breakthrough-related recombinations tend to arise within densely connected sub-networks. To illustrate the model’s utility, we examine two cases anchored in expert knowledge: quantum annealing and AI-enabled quantum architectures. In these instances, the model successfully identified technological convergences that aligned with expert insights. Finally, we propose a three-tier decision architecture—comprising detection, expert translation, and institutional integration—to transform these forecasts into data-driven research strategies and policies, grounded in open data and explainable features.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



