Structure-Aware Prediction of PROTAC-Mediated Protein Degradability via Graph Neural Networks
Title: Leveraging Graph Neural Networks for Structure-Aware Prediction of PROTAC-Mediated Protein Degradability
Proteolysis-targeting chimeras (PROTACs) offer a mechanism to selectively eliminate disease-causing proteins; however, a significant hurdle remains in predicting which proteins are susceptible to degradation. Current computational approaches are limited by their reliance on the complete molecular structure of the PROTAC, a detail that is inaccessible prior to synthesis. To address this, we introduce DegradoMap, a graph neural network capable of forecasting PROTAC-mediated degradability using only protein structure and the identity of the E3 ligase—data that is available at the earliest stages of target selection.
DegradoMap incorporates biophysical priors through a lysine-weighted graph pooling strategy combined with per-protein normalization. It assesses protein-E3 compatibility via cross-attention mechanisms and enriches its analysis by integrating cellular context derived from the Cancer Dependency Map. Evaluated on the PROTAC-8K benchmark, which comprises 3,101 samples across 155 targets and 10 E3 ligases, the model demonstrated robust performance. It achieved an AUROC of 0.646 ± 0.124 in target-unseen evaluations, with the best seed reaching 0.7449. Furthermore, it attained an AUROC of 0.811 in CRBN-to-VHL E3-unseen transfer tasks, surpassing both graph neural network and machine learning baselines. Additionally, the model successfully identified optimal E3 ligases with a Hit@3 accuracy of 74%.
The study highlights two broader insights: first, E(3)-equivariant architectures performed worse than simpler invariant designs for this specific scalar prediction task; second, while ESM-2 embeddings enhanced peak performance, this improvement was contingent upon careful regularization, as naive integration proved ineffective. DegradoMap serves as a pre-synthesis tool for assessing degradability, offering well-calibrated confidence scores (ECE = 0.029 in target-unseen settings) that allow researchers to prioritize high-confidence predictions for experimental validation. However, due to high seed variance (std = 0.124) and restricted E3 coverage, ensembling is recommended to ensure reliable deployment.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




