Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining
Title: Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining
Abstract: While the generation of high-quality molecular representations is critical for molecular design and property prediction, the scarcity of extensive labeled datasets remains a significant hurdle. Although self-supervised pretraining on molecular graphs has emerged as a viable strategy, current methods frequently suffer from limitations: they often depend on intricate generative objectives or manual augmentations, and they predominantly utilize only 2D topological data, thereby neglecting valuable 3D structural insights. To bridge this gap, we present C-FREE (Contrast-Free Representation learning on Ego-nets), a streamlined framework that merges 2D graphs with ensembles of 3D conformers. C-FREE derives molecular representations by forecasting subgraph embeddings based on their complementary neighborhoods within the latent space, employing fixed-radius ego-nets as the fundamental modeling units across various conformers. This architecture enables the seamless integration of geometric and topological features within a hybrid Graph Neural Network (GNN)-Transformer backbone, eliminating the need for negative samples, positional encodings, or costly pre-processing steps. By pretraining on the GEOM dataset, which offers extensive 3D conformational diversity, C-FREE secures state-of-the-art performance on MoleculeNet, outperforming both contrastive and generative multimodal self-supervised techniques. Furthermore, fine-tuning experiments conducted on datasets varying in size and molecular composition confirm that the pretraining model generalizes effectively to novel chemical domains, underscoring the critical value of 3D-aware molecular representations.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



