Data-Driven Spectral Prediction for Accelerating Large-Scale Electronic Structure Calculations
Title: Enhancing Large-Scale Electronic Structure Calculations Through Data-Driven Spectral Prediction
Abstract:
Simulating molecular systems with thousands of atoms demands methodologies that are highly scalable. Although contemporary Density Functional Theory (DFT) codes achieve linear scaling, the resolution of the resulting large, sparse generalized eigenproblems continues to pose a significant computational bottleneck for exascale architectures. Addressing this challenge within the framework of the LimitX project, we introduce a data-driven approach designed to accelerate these specific calculations.
To circumvent the dimensionality hurdles inherent in large-scale spectral prediction, we have shifted the machine learning objective from predicting discrete eigenvalues to estimating the coefficients of an interpolating Chebyshev polynomial. Furthermore, we evaluated both all-atom and fragment-based structural representations to determine the most effective input formats. Our study analyzed three distinct machine learning models—Graph Neural Networks, Random Forests, and Kernel Ridge Regression—using a newly compiled 2 TB dataset of protein dimers.
The spectra generated by these models serve as high-quality initial guesses, enabling BigDFT to effectively skip early Self-Consistent Field (SCF) iterations. Looking ahead, these spectral predictors are intended to be integrated into dynamic optimization routines for rational filter-based eigensolvers, such as FrASE, which is currently in its early stages of development.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





